You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Siyao Meng (Jira)" <ji...@apache.org> on 2023/01/09 19:25:00 UTC

[jira] [Updated] (HDDS-4539) Container Health Task should not run until Recon has reached steady state.

     [ https://issues.apache.org/jira/browse/HDDS-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siyao Meng updated HDDS-4539:
-----------------------------
    Fix Version/s: 1.4.0
                       (was: 1.3.0)

> Container Health Task should not run until Recon has reached steady state.
> --------------------------------------------------------------------------
>
>                 Key: HDDS-4539
>                 URL: https://issues.apache.org/jira/browse/HDDS-4539
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Recon
>            Reporter: Aravindan Vijayan
>            Assignee: Devesh Kumar Singh
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> On a cluster with millions of containers or hundreds of Datanodes, it will take some time for Recon to reach a steady state (all active DNs and Containers reported). If the container health task is run before this, it can incorrectly flag most of the containers as missing. This was seen in a cluster where Recon reaching steady state is slow due to HDDS-4403, and it also leads to the UI problem mentioned in HDDS-4402. 
> We need to make sure the container health task is not run before cluster has reached steady state. This could be a fixed wait time (~10mins) or by checking Recon's SCM state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org