You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mark Gui (Jira)" <ji...@apache.org> on 2021/06/28 06:35:00 UTC
[jira] [Updated] (HDDS-5394) Fix skipped volume check due to
disk.check.min.gap
[ https://issues.apache.org/jira/browse/HDDS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Gui updated HDDS-5394:
---------------------------
Description:
After HDDS-5268, datanode data volumes and ratis volumes are checked in a single periodic volume checker together.
But actually, data volumes and ratis volumes are checked in 2 separated `checkAllVolumes` calls, the `checkAllVolumes` will check whether 2 successive calls are executed within a time gap controlled by 'disk.check.min.gap', then ratis volumes are always skipped.
To fix it we could put the check in `checkAllVolumeSets` which check volume sets in a single pass one by one.
And there is a another problem, there are 2 volume checkers implemented in datanode:
* Periodic Volume Checker
* On-demand Volume Checker(HDDS-5089)
The periodic volume checker is scheduled at fixed rate, 15 mins by default, but 'disk.check.min.gap' is also 15 mins by default and it also controls the time gap of 2 successive checks for a single volume. So within the 15 mins between 2 periodic checks, no on-demand check could happen.
To fix it we could make the 'periodic.disk.check.interval.minutes' longer, such as 1 hour, since we have the on-demand disk checker, this should be fine.
> Fix skipped volume check due to disk.check.min.gap
> --------------------------------------------------
>
> Key: HDDS-5394
> URL: https://issues.apache.org/jira/browse/HDDS-5394
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
>
> After HDDS-5268, datanode data volumes and ratis volumes are checked in a single periodic volume checker together.
> But actually, data volumes and ratis volumes are checked in 2 separated `checkAllVolumes` calls, the `checkAllVolumes` will check whether 2 successive calls are executed within a time gap controlled by 'disk.check.min.gap', then ratis volumes are always skipped.
> To fix it we could put the check in `checkAllVolumeSets` which check volume sets in a single pass one by one.
> And there is a another problem, there are 2 volume checkers implemented in datanode:
> * Periodic Volume Checker
> * On-demand Volume Checker(HDDS-5089)
> The periodic volume checker is scheduled at fixed rate, 15 mins by default, but 'disk.check.min.gap' is also 15 mins by default and it also controls the time gap of 2 successive checks for a single volume. So within the 15 mins between 2 periodic checks, no on-demand check could happen.
> To fix it we could make the 'periodic.disk.check.interval.minutes' longer, such as 1 hour, since we have the on-demand disk checker, this should be fine.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org