You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yuqi Du (Code Review)" <ge...@cloudera.org> on 2022/08/19 08:37:02 UTC

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Yuqi Du has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18867


Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................

[threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

When shutdown thread pool with SchedulerThread, thread pool's variable 'scheduler_' is
deleted and 'scheduler_' = nullptr, at the same time, thread pool token not shutdowned and
Schedule a task, this may cause a coredump because of 'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
2 files changed, 30 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 1
Gerrit-Owner: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#5).

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................

[threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

When shutdown thread pool with SchedulerThread, thread pool's variable 'scheduler_' is
deleted and 'scheduler_' = nullptr, at the same time, thread pool token not shutdowned and
Schedule a task, this may cause a coredump because of 'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 57 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 5
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18867/7//COMMIT_MSG
Commit Message:

PS7: 
Please make the commit description to conform with the guidelines at https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines (which is referenced at https://kudu.apache.org/docs/contributing.html#_submitting_patches)


http://gerrit.cloudera.org:8080/#/c/18867/7//COMMIT_MSG@7
PS7, Line 7: extreme
What's extreme here?


http://gerrit.cloudera.org:8080/#/c/18867/7/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/7/src/kudu/util/threadpool-test.cc@186
PS7, Line 186: ForExtreme
What's 'extreme'?


http://gerrit.cloudera.org:8080/#/c/18867/7/src/kudu/util/threadpool-test.cc@187
PS7, Line 187:   VLOG(1) << "hello from task: " << i;
What is this for?


http://gerrit.cloudera.org:8080/#/c/18867/7/src/kudu/util/threadpool-test.cc@202
PS7, Line 202: || status.IsIllegalState()
Why to expect Status::IllegalState() here?  Please add a comment to clarify.



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 7
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Thu, 01 Sep 2022 02:10:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................


Patch Set 6:

(5 comments)

Thanks your crs.

http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool-test.cc@189
PS5, Line 189:   int delay_ms = 1 + (static_cast<int>(current_time) % 3);
> what's the purpose pf mode 3?
To reduce ScheduleThread's length, to avoid queue too large. Fix it another form.


http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool-test.cc@206
PS5, Line 206: 
> why sleep here? is there any other way to make sure all works done?
To SchedulerThread run 4-5 times, the purpose not do all the works.


http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool-test.cc@232
PS5, Line 232: 
> nit: you can use ASSERT_OK dirrectly, then you can get Status detailes if a
Done


http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool.h@618
PS5, Line 618:  queue_time_t
> nit: 'scheduler_'.
Done


http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool.cc@418
PS5, Line 418:     MutexLock unique_lock(scheduler_lock_);
> scheduler_ can be obtained by scheduler(), is it possible to prevent it to 
'scheduler()' is not significate, it has deleted.



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 6
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 29 Aug 2022 09:11:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#4).

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................

[threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

When shutdown thread pool with SchedulerThread, thread pool's variable 'scheduler_' is
deleted and 'scheduler_' = nullptr, at the same time, thread pool token not shutdowned and
Schedule a task, this may cause a coredump because of 'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 55 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 4
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Sat, 15 Oct 2022 14:02:49 +0000
Gerrit-HasComments: No

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 18: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 18
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 06 Dec 2022 04:13:23 +0000
Gerrit-HasComments: No

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#16).

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutdown thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and set to nullptr, at the same time,
if thread pool token is not shutdown and schedule a task, this may cause
a coredump because 'scheduler_' is a nullptr.

This patch fix the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 96 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/16
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 16
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 14 Nov 2022 06:56:27 +0000
Gerrit-HasComments: No

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18867/18/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/18/src/kudu/util/threadpool-test.cc@224
PS18, Line 224: token->Shutdown();
              :   ASSERT_OK(token->Schedule([&counter]() { SimpleTaskMethod(3, &counter); }, delay_ms));
> I'm confused about why we can still use the token schedule tasks while it h
One thread shutdown the 'token' and another thread can do 'token->Schedule'. The two lines is a mock scene for that case.


The second question 'Should the Schedule() return IllegalState?'. Yes, it can be written like this.
But there is not a very good way to do this to check token' state of shutdown.

MaySubmitNewTasks() can do this, but no lock protect 'state_' in ThreadPoolToken  and it uses the pool_'s lock (DoSubmit). we can not use the lock as 'DoSubmit' because
In ‘pool_->Scheduler()’? also use the lock again, that's a dead lock. Ignore the lock may be ok. state_ RUNNING to shutdown(QUIESCING, QUIESCED), read an old value, that's not a serious problem, but it's not reach our intention (check the state), we should also deal with the misjudgment.

When token is shutdown and  'token->Scheduler() ' return OK, that's not a problem. That means function task is pushed into thread pool's SchedulerThread queue, and check by SchedulerThread' logic (token->Submit can check it).



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 18
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Dec 2022 13:54:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
            : token is not shutdown and a task is scheduled on the toke
> There are 3 participants:
Thanks for your clarification!

I wonder if we can use the 'pool_status_' to check if new tasks can be scheduled, just like the usage in ThreadPool::DoSubmit: https://github.com/apache/kudu/blob/00e78f58978c6d4f00871eb8bd91c29327799c76/src/kudu/util/threadpool.cc#L532-L535.



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 07 Nov 2022 09:39:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix a bug when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#10).

Change subject: [threadpool] Fix a bug when SchedulerThread shutdown
......................................................................

[threadpool] Fix a bug when SchedulerThread shutdown

When shutdown thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and 'scheduler_' = nullptr, at the same time,
thread pool token not shutdowned and schedule a task, this may cause
a coredump because of 'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to protect variable
'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 60 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/10
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 10
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#14).

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutdown thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and 'scheduler_' = nullptr, at the same time,
thread pool token not shutdowned and schedule a task, this may cause
a coredump because of 'scheduler_' = nullptr.

This patch fix the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 97 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/14
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 14
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#15).

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutdown thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and 'scheduler_' = nullptr, at the same time,
thread pool token not shutdowned and schedule a task, this may cause
a coredump because of 'scheduler_' = nullptr.

This patch fix the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 97 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/15
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 15
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17:

(1 comment)

> Patch Set 17:
> 
> (1 comment)

Thank you for your advices. You are kind.

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
            : token is not shutdown and a task is scheduled on the toke
> Thanks for your clarification!
I study the 'pool_status_' .  

At my called stacks after changes:

1. ThreadPoolToken::Schedule
2. ThreadPool::Schedule
3. scheduler_->Schedule

4. another thread run 'token->Submit(task.func());'


At 1,2,3. 'pool_status_' can be used at 2. But 2's purpose is protect 'scheduler_'. and at this position check 'pool_status_' is ok, but I think it's still not enough. because 'scheduler_' shutdown before thread pool. So if the following orders:

1. check thread pool' 'pool_status_' is ok/ not ok.
2. scheduler_ shutdown
3. thread pool shutdown

If 1's 'pool_status_' is not ok, we can return.
If 1's 'pool_status_'  is ok, then scheduler_->Scheduler, and aother thread 'scheduler_ shutdown'. only check 'pool_status_' is not enough.



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Sat, 12 Nov 2022 03:03:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
            : token is not shutdown and a task is scheduled on the toke
> The first question, yes. The shutdown order is needed by before, not the pa
Could you explain what scenario we need to call token->Schedule() after shutting down the thread pool? As far as I can see, we won't call token->Submit() in the existing code base, and it's also unsafe to do so. That's to say, ThreadPoolToken is only valid for the lifetime of ThreadPool.



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Oct 2022 08:05:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 18:

(1 comment)

Thank your for you advices.

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
            : token is not shutdown and a task is scheduled on the toke
> Would it be possible to set 'pool_stats_' ServiceUnavailable before set 'sc
I have tried to do so as you said. It seems unsteady for test case: TestExtremeScheduler. Three results:

1. OK
2. running forever, not finish.
3. Failed (The case can be fixed)

The key problem is 2.  The log 'Waited for 2000ms trying to join with scheduler' means scheduler_.join.
so program is in thread pool's shutdown and hold the 'lock_', but why 'scheduler_' does not exit?
We can see 'SchedulerThread::RunLoop()':   
' token->Submit() '  it call 'ThreadPool::DoSubmit' which also want to hold 'lock_', deadlock.

So we can not do that only using 'lock_' simply.



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 18
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 16 Nov 2022 07:45:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 15: Code-Review+1

generally LGTM


-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 15
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Sep 2022 17:03:45 +0000
Gerrit-HasComments: No

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#3).

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................

[threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

When shutdown thread pool with SchedulerThread, thread pool's variable 'scheduler_' is
deleted and 'scheduler_' = nullptr, at the same time, thread pool token not shutdowned and
Schedule a task, this may cause a coredump because of 'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 51 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 3
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool-test.cc@189
PS5, Line 189:   if (static_cast<int>(current_time) % 3 == 0) {
what's the purpose pf mode 3?


http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool-test.cc@206
PS5, Line 206:   SleepFor(MonoDelta::FromMilliseconds(5));
why sleep here? is there any other way to make sure all works done?


http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool-test.cc@232
PS5, Line 232: ASSERT_TRUE
nit: you can use ASSERT_OK dirrectly, then you can get Status detailes if assert failed.


http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool.h@618
PS5, Line 618: ‘scheduler_’.
nit: 'scheduler_'.

And it'd better to move it closer above to the place where 'scheduler_' is defined.


http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/5/src/kudu/util/threadpool.cc@418
PS5, Line 418:     MutexLock unique_lock(scheduler_lock_);
scheduler_ can be obtained by scheduler(), is it possible to prevent it to be deleted outside of class ThreadPool?



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 5
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Sat, 27 Aug 2022 09:28:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix a coredump when shutdown SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#9).

Change subject: [threadpool] Fix a coredump when shutdown SchedulerThread
......................................................................

[threadpool] Fix a coredump when shutdown SchedulerThread

When shutdown thread pool with SchedulerThread,
thread pool's variable 'scheduler_' is deleted
and 'scheduler_' = nullptr, at the same time,
thread pool token not shutdowned and schedule a
task, this may cause a coredump because of
'scheduler_' = nullptr.

This patch add a mutex lock in class
'ThreadPool' to protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 60 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/9
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 9
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@a289
PS11, Line 289: 
Now this check will fail?


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@188
PS11, Line 188: 1 + (static_cast<int>(current_time) % 3)
1 and 3 are magic numbers here, what do they mean? could you add some comments?


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@195
PS11, Line 195: In SchedulerThread have many task need to scheduled
nit: how about 
There are many tasks in SchedulerThread need to be scheduled


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@202
PS11, Line 202: time_t current_time = time(nullptr);
              :     int delay_ms = 1 + (static_cast<int>(current_time) % 3);
              :     Status status =
              :         token->Schedule(std::bind(&ContinuousIssueTraceStatement, token.get(), i), delay_ms);
              :     // At the case, 'pool_->Shutdown();' would shutdown its SchedulerThread firstly, after that
              :     // 'token->Scheduler(...)' would return a IllegalState.
              :     ASSERT_TRUE(status.ok() || status.IsIllegalState());
Duplicate code with function ContinuousIssueTraceStatement? Is it possible to improve the design to reduce duplicate code when we use 'Scheduler'? For example, the task submit to the token can be auto re-submit when it complete?


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@236
PS11, Line 236:   ASSERT_OK(token->Schedule(&IssueTraceStatement, 1000));
Is it OK to schedule this task even though the token is shutdown?


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool.cc@418
PS11, Line 418:     MutexLock unique_lock(scheduler_lock_);
In ThreadPool::WaitForScheduler, 'scheduler_' is also used for some purpose, why not protect it?


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool.cc@418
PS11, Line 418: unique_lock
nit: In L530, the guard is using a simple name 'l', it suggest to use the same name for the same lock.



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 11
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 05 Sep 2022 16:37:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutting down a thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
token is not shutdown and a task is scheduled on the token, this may cause
a coredump because 'scheduler_' is a nullptr.

This patch fixes the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Reviewed-on: http://gerrit.cloudera.org:8080/18867
Tested-by: Kudu Jenkins
Reviewed-by: Yingchun Lai <ac...@gmail.com>
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 133 insertions(+), 51 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Yingchun Lai: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 19
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#11).

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutdown thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and 'scheduler_' = nullptr, at the same time,
thread pool token not shutdowned and schedule a task, this may cause
a coredump because of 'scheduler_' = nullptr.

This patch fix the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 60 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/11
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 11
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#2).

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................

[threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

When shutdown thread pool with SchedulerThread, thread pool's variable 'scheduler_' is
deleted and 'scheduler_' = nullptr, at the same time, thread pool token not shutdowned and
Schedule a task, this may cause a coredump because of 'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 49 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 2
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutdown thread pool with SchedulerThread, thread pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time,
            : if thread pool token is not shutdown and schedule a task,
Since the ThreadPoolToken is created after ThreadPool is created, when we need to shut down a ThreadPool, we should shut down ThreadPoolToken first, right? So I think callers should be restricted from calling ThreadPoolToken::Schedule() after shutting down the thread pool.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h@612
PS16, Line 612:   SchedulerThread* scheduler_;
Can the 'scheduler' be a unique_ptr so we don't need to explicitly delete it? Seems it is owned by the ThreadPool. If not, could you document why it must be a raw pointer?


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc@263
PS16, Line 263: {
              :     MutexLock unique_lock(pool_->lock_);
              :     if (PREDICT_FALSE(!MaySubmitNewTasks())) {
              :       return Status::ServiceUnavailable("Thread pool token was shut down");
              :     }
              :   }
nit: Maybe we can move this check into ThreadPool::Schedule?



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 16
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 26 Sep 2022 09:35:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 13:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@187
PS13, Line 187:   struct SchedulerTask {
              :     ThreadPoolToken* thread_pool_token;
              :     std::function<void()> f;
              :   };
Is this structure really needs to be public?

If indeed so, then please document this structure: what is it for and how its fields are used.  Can 'thread_pool_token' be null?


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@192
PS13, Line 192:   void Schedule(ThreadPoolToken* token,
              :                 std::function<void()> f,
              :                 const MonoTime& execute_time) {
Please document this public method as well: what it does and how the parameters are used.  Can 'token' be null?  If yes, add a test scenario where nullptr is passed for 'token'.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@276
PS13, Line 276:   Status Schedule(ThreadPoolToken* token, std::function<void()> f, MonoTime execute_time);
Please follow the pattern for other methods in this interface and add a concise description of what this method does.

Can 'token' be null?  If not, then why to add Schedule() to ThreadPool API, not just ThreadPoolToken?  If yes, then please add a test scenario where 'token' is passed as 'nullptr'.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@506
PS13, Line 506:   void WaitForScheduler();
Where is this method used in addition to the ThreadPoolTest.TestSimpleTasks test scenario?  If nowhere else then why to introduce this method at all?


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@608
PS13, Line 608:   // Protect 'scheduler_'.
Since WaitForScheduler() acquires another lock along with 'scheduler_lock_', please add a comment about the right sequence of doing so to avoid possible deadlocks.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678
PS13, Line 678: pending tasks finish
for pending tasks to finish


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678
PS13, Line 678: , for example
              :   // at unit tests.
I'm not sure this reference to unit tests is quite relevant here.  Maybe, add similar comment into the unit tests themselves?


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@680
PS13, Line 680: MutexLock l(scheduler_lock_);
From what I can see in the code, ThreadPool::WaitForScheduler() isn't used anywhere but just in a ThreadPoolTest.TestSimpleTasks.  With that, how do we protect against "deadlock" situations where first ThreadPool::WaitForScheduler() is called and then ThreadPool::Shutdown() is called while WaitForScheduler() still holds the scheduler_lock_ and waits on the 'idle_cond_', while there is a task which reschedules itself again and again?



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 14 Sep 2022 21:27:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Yifan Zhang, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#18).

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutting down a thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
token is not shutdown and a task is scheduled on the token, this may cause
a coredump because 'scheduler_' is a nullptr.

This patch fixes the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 133 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/18
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 18
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 12:

(7 comments)

Thanks your crs.

http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@a289
PS11, Line 289: 
> Now this check will fail?
No. 'pool_->scheduler()' the method has removed.
add it back.


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@188
PS11, Line 188: a random time: 1ms or 2ms or 3ms.
> 1 and 3 are magic numbers here, what do they mean? could you add some comme
ok, it;s a random delay time from 1, 2, 3.


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@195
PS11, Line 195: 
> nit: how about 
Done


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@202
PS11, Line 202: SERT_OK(RebuildPoolWithScheduler(1, 1, 1));
              :   unique_ptr<ThreadPoolToken> token = pool_->NewToken(ThreadPool::ExecutionMode::SERIAL);
              :   for (int i = 0; i < 2048; i++) {
              :     ContinuousIssueTraceStatement(token.get());
              :   }
              :   SleepFor(MonoDelta::FromMilliseconds(5));
              :   pool_->Shutdown();
> Duplicate code with function ContinuousIssueTraceStatement? Is it possible 
Done


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool-test.cc@236
PS11, Line 236: 
> Is it OK to schedule this task even though the token is shutdown?
according to syntax, It should not ok. Fix it.


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool.cc@418
PS11, Line 418: 
> In ThreadPool::WaitForScheduler, 'scheduler_' is also used for some purpose
Should protect it.


http://gerrit.cloudera.org:8080/#/c/18867/11/src/kudu/util/threadpool.cc@418
PS11, Line 418: 
> nit: In L530, the guard is using a simple name 'l', it suggest to use the s
done



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 12
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 13 Sep 2022 07:05:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#12).

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutdown thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and 'scheduler_' = nullptr, at the same time,
thread pool token not shutdowned and schedule a task, this may cause
a coredump because of 'scheduler_' = nullptr.

This patch fix the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 71 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/12
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 12
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 16:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutdown thread pool with SchedulerThread
nit: When shutting down a thread pool with SchedulerThread


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@11
PS16, Line 11: schedule a task
nit: a task is scheduled on the token


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@14
PS16, Line 14: fix
nit: fixes


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc@159
PS16, Line 159:   SleepFor(MonoDelta::FromMilliseconds(2 * kDelayMs));
              :   pool_->Wait();
Is it essential to have both SleepFor() and pool_->Wait() here?  Wouldn't just pool_-Wait() be enough?  Please add a comment to clarify.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc@256
PS16, Line 256:   // Wait the schedule task execute.
              :   SleepFor(MonoDelta::FromMilliseconds(200));
How essential is this extra wait?  It seems there is pool_->Wait() below, so I'd guess that should be enough, no?  Otherwise, please add a comment to explain why this is necessary.

Also, with this extra wait, is the test stable enough in case of scheduling anomalies?  E.g. try to run with --stress_cpu_threads=16 at an inferior hardware and see if it starts failing.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h@187
PS16, Line 187:   struct SchedulerTask {
              :     ThreadPoolToken* thread_pool_token;
              :     std::function<void()> f;
              :   };
Is it possible to move this into the 'private' section of the class?

If not, it would be great to document the fields.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@680
PS13, Line 680:   }
> Yes, you are right. My origin intention is, users who use it, should know t
This sounds like a good idea -- if you find this is needed elsewhere but not just in tests, you can introduce this or similar method back with proper API scope (e.g. this method is private/protected and used only in some higher-level wrapper method).

Thanks!


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc@540
PS16, Line 540: scheduler_->Schedule(token, std::move(f), execute_time);
Should this be wrapped into RETURN_NOT_OK()?  Maybe, just

return scheduler_->Schedule(token, std::move(f), execute_time);



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 16
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 26 Sep 2022 01:38:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#6).

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................

[threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

When shutdown thread pool with SchedulerThread, thread pool's variable 'scheduler_' is
deleted and 'scheduler_' = nullptr, at the same time, thread pool token not shutdowned and
Schedule a task, this may cause a coredump because of 'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 53 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 6
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18867/18/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/18/src/kudu/util/threadpool-test.cc@224
PS18, Line 224: token->Shutdown();
              :   ASSERT_OK(token->Schedule([&counter]() { SimpleTaskMethod(3, &counter); }, delay_ms));
I'm confused about why we can still use the token schedule tasks while it has been shut down. Should the Schedule() return IllegalState?



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 18
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 07 Dec 2022 11:06:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Yifan Zhang, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#17).

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutting down a thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
token is not shutdown and a task is scheduled on the token, this may cause
a coredump because 'scheduler_' is a nullptr.

This patch fixes the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 112 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/17
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18867/2/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/2/src/kudu/util/threadpool-test.cc@20
PS2, Line 20: #include <ctime>
> warning: inclusion of deprecated C++ header 'time.h'; consider using 'ctime
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 3
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 23 Aug 2022 16:27:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#13).

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................

[threadpool] Fix unsafe behaviour when SchedulerThread shutdown

When shutdown thread pool with SchedulerThread, thread pool's variable
'scheduler_' is deleted and 'scheduler_' = nullptr, at the same time,
thread pool token not shutdowned and schedule a task, this may cause
a coredump because of 'scheduler_' = nullptr.

This patch fix the bug, adding a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 76 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/13
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix a coredump bug when shutdown SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#8).

Change subject: [threadpool] Fix a coredump bug when shutdown SchedulerThread
......................................................................

[threadpool] Fix a coredump bug when shutdown SchedulerThread

When shutdown thread pool with SchedulerThread, thread pool's
variable 'scheduler_' is deleted and 'scheduler_' = nullptr,
at the same time, thread pool token not shutdowned and
schedule a task, this may cause a coredump because of
'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to
protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 60 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 8
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17:

(1 comment)

> Patch Set 17:
> 
> (1 comment)
> 
> > Patch Set 17:
> > 
> > (1 comment)
> 
> Thank you for your advices. You are kind.

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
            : token is not shutdown and a task is scheduled on the toke
> I study the 'pool_status_' .  
Would it be possible to set 'pool_stats_' ServiceUnavailable before set 'scheduler' nullptr? And could the 'scheduler_' be protected by the 'lock_' too? Why is it necessary to have a separate lock?



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 14 Nov 2022 09:27:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix a coredump bug when shutdown SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix a coredump bug when shutdown SchedulerThread
......................................................................


Patch Set 8:

(5 comments)

According to your advices, I have fixed them.
Thanks for your crs and time.
You can review again.

http://gerrit.cloudera.org:8080/#/c/18867/7//COMMIT_MSG
Commit Message:

PS7: 
> Please make the commit description to conform with the guidelines at https:
I have read the docs several times and not very clear to your means.
The error is a line words too long?
I think you can directly point out errors that should fix.


http://gerrit.cloudera.org:8080/#/c/18867/7//COMMIT_MSG@7
PS7, Line 7: oredump
> What's extreme here?
Fixed the title.


http://gerrit.cloudera.org:8080/#/c/18867/7/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/7/src/kudu/util/threadpool-test.cc@186
PS7, Line 186: eStatement
> What's 'extreme'?
Fixed the name.


http://gerrit.cloudera.org:8080/#/c/18867/7/src/kudu/util/threadpool-test.cc@187
PS7, Line 187:   time_t current_time = time(nullptr);
> What is this for?
remote it.


http://gerrit.cloudera.org:8080/#/c/18867/7/src/kudu/util/threadpool-test.cc@202
PS7, Line 202: me(nullptr);
> Why to expect Status::IllegalState() here?  Please add a comment to clarify
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 8
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Thu, 01 Sep 2022 03:14:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 15:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG@10
PS15, Line 10: 'scheduler_' =
ni: set to


http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG@11
PS15, Line 11: thread
if thread


http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG@11
PS15, Line 11: not shutdowned
is not shutdown


http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG@12
PS15, Line 12: of 'scheduler_' =
scheduler_ is a


http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool-test.cc@184
PS15, Line 184: time_t current_time = time(nullptr);
How about use class Random to generate random data, the test may run very fast that `current_time` may be the same.


http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool-test.cc@204
PS15, Line 204: SleepFor(MonoDelta::FromMilliseconds(5));
It it used to ensure the test can be finished safetly, what will happen if removing it?


http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool.cc@538
PS15, Line 538: is shutdown
nit: has been shutdown?



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 15
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Sep 2022 17:03:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 16: Code-Review+1

(7 comments)

Thanks your crs, you are very kind.
Thank you.

http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG@10
PS15, Line 10: set to nullptr
> ni: set to
Done


http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG@11
PS15, Line 11: if thr
> if thread
Done


http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG@11
PS15, Line 11: en is not shut
> is not shutdown
Done


http://gerrit.cloudera.org:8080/#/c/18867/15//COMMIT_MSG@12
PS15, Line 12: 'scheduler_' is a
> scheduler_ is a
Done


http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool-test.cc@184
PS15, Line 184: // delay_ms is a random time: 1ms or
> How about use class Random to generate random data, the test may run very f
ok, have changed.


http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool-test.cc@204
PS15, Line 204: pool_->Shutdown();
> It it used to ensure the test can be finished safetly, what will happen if 
Remove it,  it will also succeed, but the case is not better than before, because all tasks may be pending.

Every 'ContinuousIssueTraceStatement' run ensure submit a new delay task, so the scheduler's queue(std::mutimap)  always have some tasks, this will encounter running tasks when 'pool_' shutdown


http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/15/src/kudu/util/threadpool.cc@538
PS15, Line 538: has been sh
> nit: has been shutdown?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 16
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 14:58:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17:

(13 comments)

Thanks for your crs. I have fixed them and you can review them again.

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with Schedule
> nit: When shutting down a thread pool with SchedulerThread
Done


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@11
PS16, Line 11: uled on the tok
> nit: a task is scheduled on the token
Done


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
            : token is not shutdown and a task is scheduled on the toke
> Since the ThreadPoolToken is created after ThreadPool is created, when we n
The first question, yes. The shutdown order is needed by before, not the patch added.

So if thread pool has shut down, token invoke 'schedule/Submit' , core dump may happen.


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@14
PS16, Line 14: fix
> nit: fixes
Done


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc@159
PS16, Line 159:   SleepFor(MonoDelta::FromMilliseconds(2 * kDelayMs));
              :   // make sure a
> Is it essential to have both SleepFor() and pool_->Wait() here?  Wouldn't j
OK, I will add some comments.   

'SleepFor(MonoDelta::FromMilliseconds(2 * kDelayMs));'  make sure the line 154 task scheduled in SchedulerThread.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc@256
PS16, Line 256:   ASSERT_OK(pool_->Schedule(
              :       nullptr, [&counter]() { SimpleTaskMetho
> How essential is this extra wait?  It seems there is pool_->Wait() below, s
make sure the delayed task execute, pool_->Wait()  do not wait. I add a comment.

I make an experiment, at cpu 32 cores, --stress_cpu_threads=4000 is ok.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h@187
PS16, Line 187:   class SchedulerTask {
              :    public:
              :     SchedulerTask(ThreadPoolToken* thread_pool_token, std::function<void()> func)
              :     
> Is it possible to move this into the 'private' section of the class?
change it to 'class'.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h@612
PS16, Line 612:   const ThreadPoolMetrics metrics_;
> Can the 'scheduler' be a unique_ptr so we don't need to explicitly delete i
I will adding a document, because of we should control the deleting order.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678
PS13, Line 678: ntil)) {
> Done
WaitForScheduler is removed.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678
PS13, Line 678: 
              :       return false;
> Done
WaitForScheduler is removed.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@680
PS13, Line 680:   }
> This sounds like a good idea -- if you find this is needed elsewhere but no
ok


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc@263
PS16, Line 263: MonoTime execute_time = MonoTime::Now();
              :   execute_time.AddDelta(MonoDelta::FromMilliseconds(delay_ms));
              :   return pool_->Schedule(this, std::move(f), execute_time);
              : }
              : 
              : voi
> nit: Maybe we can move this check into ThreadPool::Schedule?
ok


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc@540
PS16, Line 540: scheduler_->Schedule(token, std::move(f), execute_time);
> Should this be wrapped into RETURN_NOT_OK()?  Maybe, just
The function 'scheduler_->Schedule()' can return status, and it must return Status::OK();
Fixing it is ok, but it seems no strong reason, 
someone may like return 'void'.

What do you think?  I'll do this according to your reply.

`
  void Schedule(ThreadPoolToken* token,
                std::function<void()> f,
                const MonoTime& execute_time) {
    MutexLock unique_lock(mutex_);
    future_tasks_.insert({execute_time, SchedulerTask({token, std::move(f)})});
  }
`



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 07 Oct 2022 12:39:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 13:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@187
PS13, Line 187:   struct SchedulerTask {
              :     ThreadPoolToken* thread_pool_token;
              :     std::function<void()> f;
              :   };
> Is this structure really needs to be public?
I just use a simple struct, no other reasons.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@192
PS13, Line 192:   void Schedule(ThreadPoolToken* token,
              :                 std::function<void()> f,
              :                 const MonoTime& execute_time) {
> Please document this public method as well: what it does and how the parame
token can be null. Add a ut case.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@276
PS13, Line 276:   Status Schedule(ThreadPoolToken* token, std::function<void()> f, MonoTime execute_time);
> Please follow the pattern for other methods in this interface and add a con
Comments have been added. token can be null.

In class 'ThreadPool', I want to protect ‘scheduler_’ using a mutexlock.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@506
PS13, Line 506:   void WaitForScheduler();
> Where is this method used in addition to the ThreadPoolTest.TestSimpleTasks
Have Removed.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.h@608
PS13, Line 608:   // Protect 'scheduler_'.
> Since WaitForScheduler() acquires another lock along with 'scheduler_lock_'
'WaitForScheduler' is removed. deadlock is impossible. The two lock cann't locked at the same time.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678
PS13, Line 678: pending tasks finish
> for pending tasks to finish
Done


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678
PS13, Line 678: , for example
              :   // at unit tests.
> I'm not sure this reference to unit tests is quite relevant here.  Maybe, a
Done


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@680
PS13, Line 680: MutexLock l(scheduler_lock_);
> From what I can see in the code, ThreadPool::WaitForScheduler() isn't used 
Yes, you are right. My origin intention is, users who use it, should know the tasks will finish to avoid take a long time to wait tasks's finish. Just like my unittest cases.

But this is hard. So, deleting it is better.



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 19 Sep 2022 11:40:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Yingchun Lai, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18867

to look at the new patch set (#7).

Change subject: [threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread
......................................................................

[threadpool] Fix an extreme coredump bug when shutdown thread pool with SchedulerThread

When shutdown thread pool with SchedulerThread, thread pool's variable 'scheduler_' is
deleted and 'scheduler_' = nullptr, at the same time, thread pool token not shutdowned and
Schedule a task, this may cause a coredump because of 'scheduler_' = nullptr.

This patch add a mutex lock in class 'ThreadPool' to protect variable 'scheduler_'.

Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
---
M src/kudu/util/threadpool-test.cc
M src/kudu/util/threadpool.cc
M src/kudu/util/threadpool.h
3 files changed, 54 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18867/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 7
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [threadpool] Fix unsafe behaviour when SchedulerThread shutdown

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17:

(4 comments)

Thanks your crs.

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with Schedule
> Done
Done


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time, if thread pool
            : token is not shutdown and a task is scheduled on the toke
> Could you explain what scenario we need to call token->Schedule() after shu
There are 3 participants:
1. user threads, which use token->Scheduler(...), set a delayed task after a period time.
2. threadpool' threads, which will run the delay tasks or normal tasks.
3. threadpool's scheduler  thread,  step 1's delayed tasks will firstly be pushed into scheduler thread's queue, and then every period take some tasks to run.

The  safe shutdown orders as  'class SchedulerThread {' comments:
1. thread pool shutdown: to shutdown scheduler thread. (The following codes will do threadpool's shutdown)
2. token shutdown and release
3. pool shutdown(In fact 1 has shutdown) and release

At step 1, it takes a small period time. 
First scheduler_thread shutdown, then execute shutdown logic codes of thread pool before.

When scheduler_thread shutdown finished, but thread pool is not shutdown.  thread pool token can be used by user threads and token->Scheduler(...) can be called, but at the time, scheduler_thread is nullptr, so core dump would happen at https://gerrit.cloudera.org/c/18867/17/src/kudu/util/threadpool.cc#b267.


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@11
PS16, Line 11: uled on the tok
> Done
Done


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@14
PS16, Line 14: fix
> Done
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 01 Nov 2022 07:58:39 +0000
Gerrit-HasComments: Yes