You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2022/04/12 20:05:00 UTC

[jira] [Commented] (HDDS-6577) Configurations to reserve HDDS volume space are not followed

    [ https://issues.apache.org/jira/browse/HDDS-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521317#comment-17521317 ] 

Ethan Rose commented on HDDS-6577:
----------------------------------

I have set up a test to simulate full volumes using fixed size tmpfs mounts in docker [here|https://github.com/errose28/hadoop-ozone/tree/HDDS-6577-test-hdds-capacity]

The test uses a [modified Ozone configuration|https://github.com/errose28/hadoop-ozone/blob/HDDS-6577-test-hdds-capacity/hadoop-ozone/dist/src/main/compose/upgrade/compose/non-ha/docker-config] to run the non-HA upgrade compose cluster (which already uses volume mounts). A smaller container size, block size, chunk size, heartbeat interval, and node report interval are configured to make the test easier in a local docker setup. Once the cluster is up, the command `docker-compose exec om ozone freon ockg -n 100 --size=10000000` can be used to generate enough data to overflow the 1GB tmpfs mount on each datanode. If the configs are working, the command should eventually fail, and running `df -h /data` in a datanode container should show that approximately the configured amount of space is still available.

Addition logging has been added to RoundRobinVolumeChoosingPolicy (datanode, should use hdds.datanode.du.reserved) and SCMCommonPlacementPolicy (SCM, should use hdds.datanode.storage.utilization.critical.threshold) to print information about the byte sizes being used in decisions. The final implementation may want to add similar log messages at the DEBUG level for future use.

 

> Configurations to reserve HDDS volume space are not followed
> ------------------------------------------------------------
>
>                 Key: HDDS-6577
>                 URL: https://issues.apache.org/jira/browse/HDDS-6577
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode, SCM
>            Reporter: Ethan Rose
>            Priority: Major
>
> Ozone currently has two configuration keys that can be used to prevent datanode volumes from being completely filled up:
> hdds.datanode.du.reserved: A list of key-value pairs mapping each volume's path to a fixed size that should be left empty on the volume. There is no default value.
> hdds.datanode.storage.utilization.critical.threshold: A percentage of each datanode volume that should be left empty. The default value is 95%.
> In a live cluster, we have seen volumes fill up past the default 95% value. In a docker based test using a fixed size tmpfs volume mount, the configs also did not seem to be respected. It is also unclear how the configs are reconciled if configured with conflicting values. This Jira aims to provide a reliable way to limit datanode volume usage, and add tests to simulate fixed capacity disks for testing the functionality.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org