You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2021/01/14 15:19:00 UTC
[jira] [Comment Edited] (HBASE-25505) ZK watcher threads are daemonized; reconsider

    [ https://issues.apache.org/jira/browse/HBASE-25505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264952#comment-17264952 ] 

Duo Zhang edited comment on HBASE-25505 at 1/14/21, 3:18 PM:
-------------------------------------------------------------

I think I found a possible problem to lead to this situation, especially that [~larsh] confirmed that the not closed ZKWatcher is in ReplicationLogCleaner.

Before HBASE-23340, ReplicationLogCleaner will create its own ZKWatcher, as we will close it in the stop method, which will be called in the cleanup method of CleanerChore.

Then here comes the problem. The cleanup method of a ScheduledChore will only be called in the run method, so if you just call stop on the stopper instance which is passed to the ScheduledChore when creating it, everything will be fine. But in HMaster.stopChore, we use ChoreService.cancelChore to stop the ScheduledChore. So if the ScheduledChore has not been scheduled again after we set stopped to true for HMaster and before we call cancelChore(I even can not make sure setting stopped to true is happened before we call cancelChore...), the cleanup method will never be executed. And this is likely the case as the default schedule interval is 10 minutes...

I added a UT in the uploaded patch to show that, calling cancelChore will not introduce a call to cleanup.

I think a possible fix is to also call cleanup in the cancelChore method of ChoreService, just need to add a comment to say that the implementation should make sure that the method can be called multiple times without side effect.

Thanks.


was (Author: apache9):
I think I found a possible problem to lead to this situation, especially that [~larsh] confirmed that the not closed ZKWatcher is in ReplicationLogCleaner.

Before HBASE-23340, ReplicationLogCleaner will create its own ZKWatcher, as we will close it in the stop method, which will be called in the cleanup method of CleanerChore.

Then here comes the problem. The cleanup method of a ScheduledChore will only be called in the run method, so if you just call stop on the stopper instance which is passed to the ScheduledChore when creating it, everything will be fine. But in HMaster.stopChore, we use ChoreService.cancelChore to stop the ScheduledChore. So if the ScheduledChore has been scheduled again after we set stopped to true for HMaster and before we call cancelChore(I even can not make sure setting stopped to true is happened before we call cancelChore...), the cleanup method will never be executed. And this is likely the case as the default schedule interval is 10 minutes...

I added a UT in the uploaded patch to show that, calling cancelChore will not introduce a call to cleanup.

I think a possible fix is to also call cleanup in the cancelChore method of ChoreService, just need to add a comment to say that the implementation should make sure that the method can be called multiple times without side effect.

Thanks.

> ZK watcher threads are daemonized; reconsider
> ---------------------------------------------
>
>                 Key: HBASE-25505
>                 URL: https://issues.apache.org/jira/browse/HBASE-25505
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: ScheduledChore.cleanup-not-called.diff
>
>
> On HBASE-25279 there was some discussion and difference of opinion about having ZK watcher pool threads be daemonized. This is not necessarily a problem but should be reconsidered. 
> Daemon threads are subject to abrupt termination during JVM shutdown and therefore may be interrupted before state changes are complete or resources are released. 
> As long as ZK watchers are properly closed by shutdown logic the pool threads will be terminated in a controlled manner and the JVM will exit. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)