You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mark Gui (Jira)" <ji...@apache.org> on 2021/06/28 06:35:00 UTC

[jira] [Updated] (HDDS-5394) Fix skipped volume check due to disk.check.min.gap

     [ https://issues.apache.org/jira/browse/HDDS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Gui updated HDDS-5394:
---------------------------
    Description: 
After HDDS-5268, datanode data volumes and ratis volumes are checked in a single periodic volume checker together.

But actually, data volumes and ratis volumes are checked in 2 separated `checkAllVolumes` calls, the `checkAllVolumes` will check whether 2 successive calls are executed within a time gap controlled by 'disk.check.min.gap', then ratis volumes are always skipped.

To fix it we could put the check in `checkAllVolumeSets` which check volume sets in a single pass one by one.

And there is a another problem, there are 2 volume checkers implemented in datanode:
 * Periodic Volume Checker
 * On-demand Volume Checker(HDDS-5089)

The periodic volume checker is scheduled at fixed rate, 15 mins by default, but 'disk.check.min.gap' is also 15 mins by default and it also controls the time gap of 2 successive checks for a single volume. So within the 15 mins between 2 periodic checks, no on-demand check could happen.

To fix it we could make the 'periodic.disk.check.interval.minutes' longer, such as 1 hour, since we have the on-demand disk checker, this should be fine.

> Fix skipped volume check due to disk.check.min.gap
> --------------------------------------------------
>
>                 Key: HDDS-5394
>                 URL: https://issues.apache.org/jira/browse/HDDS-5394
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>
> After HDDS-5268, datanode data volumes and ratis volumes are checked in a single periodic volume checker together.
> But actually, data volumes and ratis volumes are checked in 2 separated `checkAllVolumes` calls, the `checkAllVolumes` will check whether 2 successive calls are executed within a time gap controlled by 'disk.check.min.gap', then ratis volumes are always skipped.
> To fix it we could put the check in `checkAllVolumeSets` which check volume sets in a single pass one by one.
> And there is a another problem, there are 2 volume checkers implemented in datanode:
>  * Periodic Volume Checker
>  * On-demand Volume Checker(HDDS-5089)
> The periodic volume checker is scheduled at fixed rate, 15 mins by default, but 'disk.check.min.gap' is also 15 mins by default and it also controls the time gap of 2 successive checks for a single volume. So within the 15 mins between 2 periodic checks, no on-demand check could happen.
> To fix it we could make the 'periodic.disk.check.interval.minutes' longer, such as 1 hour, since we have the on-demand disk checker, this should be fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org