You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Kohei Sugihara (Jira)" <ji...@apache.org> on 2023/02/08 09:01:00 UTC
[jira] [Created] (HDDS-7925) Potential deadlocks among all OMs in OM HA
Kohei Sugihara created HDDS-7925:
------------------------------------
Summary: Potential deadlocks among all OMs in OM HA
Key: HDDS-7925
URL: https://issues.apache.org/jira/browse/HDDS-7925
Project: Apache Ozone
Issue Type: Bug
Components: OM HA
Affects Versions: 1.3.0
Environment: Configuration: FSO enabled, OM HA, SCM HA
Reporter: Kohei Sugihara
In our environment, from December 2022 to January 2023, we met a timeout problem several times within the window; all OMs stopped responding to OM RPCs, such as listing keys, getting keys, and getting OM roles. In all cases, rebooting OMs in the appropriate order will recover the service, but every time it recurred within a few days. We dug logs and traces about all OMs and noticed IPC Server Handler in every OM is full of OM requests waiting for either OzoneManagerLock or Ratis, including OM Ratis. Switching the OMs to non-HA and running for two weeks, the problem has not recurred.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org