You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Sammi Chen (Jira)" <ji...@apache.org> on 2022/05/05 03:21:00 UTC
[jira] [Resolved] (HDDS-6377) Redundant loop while doing triggerHeartbeat in DatanodeStateMachine
[ https://issues.apache.org/jira/browse/HDDS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sammi Chen resolved HDDS-6377.
------------------------------
Resolution: Fixed
> Redundant loop while doing triggerHeartbeat in DatanodeStateMachine
> -------------------------------------------------------------------
>
> Key: HDDS-6377
> URL: https://issues.apache.org/jira/browse/HDDS-6377
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Janus Chow
> Assignee: Janus Chow
> Priority: Major
> Labels: pull-request-available
>
> The code related to checking heartbeat is as follows.
>
> {code:java}
> L1 while (context.getState() != DatanodeStates.SHUTDOWN) {
> L2 try {
> L3 LOG.debug("Executing cycle Number : {}", context.getExecutionCount());
> L4 long heartbeatFrequency = context.getHeartbeatFrequency();
> L5 nextHB.set(Time.monotonicNow() + heartbeatFrequency);
> L6 context.execute(executorService, heartbeatFrequency,
> L7 TimeUnit.MILLISECONDS);
> L8 } catch (InterruptedException e) {
> L9 // Someone has sent interrupt signal, this could be because
> L10 // 1. Trigger heartbeat immediately
> L11 // 2. Shutdown has be initiated.
> L12 Thread.currentThread().interrupt();
> L13 } catch (Exception e) {
> L14 LOG.error("Unable to finish the execution.", e);
> L15 }
> L16
> L17 now = Time.monotonicNow();
> L18 if (now < nextHB.get()) {
> L19 if (!Thread.interrupted()) {
> L20 try {
> L21 Thread.sleep(nextHB.get() - now);
> L22 } catch (InterruptedException e) {
> L23 //triggerHeartbeat is called during the sleep
> L24 Thread.currentThread().interrupt();
> L25 }
> L26 }
> L27 }
> {code}
> The redundant case happens as follows:
> # triggerHeartBeat() called while stateMachineThread sleeping at L21.
> # IterruptedException catched in L22, "interrupted" state reset to false.
> # L24 set "interrupted" state to true.
> # Then back to while loop, in try-catch block of L2, since "interrupted" state was set to true, it will go to L8, then L12 set the "interrupted" state to true.
> # In L19, "Thread.interrupted()" was checked, since the current value is true, it will skip the sleep and go to next loop of while, and "interrupted" state is reset to false here.
> # Then in try-catch block of L2, since the "interrupted" state is false, now the heartbeat is triggered.
> The issue is in the above step3, we don't need to set the "interrupted" state back to true, so that the next loop can execute the heartbeat directly.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org