You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Kohei Sugihara (Jira)" <ji...@apache.org> on 2023/02/08 09:01:00 UTC

[jira] [Created] (HDDS-7925) Potential deadlocks among all OMs in OM HA

Kohei Sugihara created HDDS-7925:
------------------------------------

             Summary: Potential deadlocks among all OMs in OM HA
                 Key: HDDS-7925
                 URL: https://issues.apache.org/jira/browse/HDDS-7925
             Project: Apache Ozone
          Issue Type: Bug
          Components: OM HA
    Affects Versions: 1.3.0
         Environment: Configuration: FSO enabled, OM HA, SCM HA
            Reporter: Kohei Sugihara


In our environment, from December 2022 to January 2023, we met a timeout problem several times within the window; all OMs stopped responding to OM RPCs, such as listing keys, getting keys, and getting OM roles. In all cases, rebooting OMs in the appropriate order will recover the service, but every time it recurred within a few days. We dug logs and traces about all OMs and noticed IPC Server Handler in every OM is full of OM requests waiting for either OzoneManagerLock or Ratis, including OM Ratis. Switching the OMs to non-HA and running for two weeks, the problem has not recurred.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org