You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ritesh H Shukla (Jira)" <ji...@apache.org> on 2022/02/28 18:49:00 UTC

[jira] [Assigned] (HDDS-6345) OM always runs OOM in Kubernetes

     [ https://issues.apache.org/jira/browse/HDDS-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ritesh H Shukla reassigned HDDS-6345:
-------------------------------------

    Assignee: Ritesh H Shukla

> OM always runs OOM in Kubernetes 
> ---------------------------------
>
>                 Key: HDDS-6345
>                 URL: https://issues.apache.org/jira/browse/HDDS-6345
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Shawn
>            Assignee: Ritesh H Shukla
>            Priority: Major
>
> I deployed ozone 1.21 to kubernetes  with security enabled and with OM HA and SCM HA. However, one of the OM always gets restarted by Kubernetes because of OOM. Even I assigned 300GB memory, the OM still keeps restarting for OOM.
>  
> After analysis, we found the OOM was because of rocksDB. When OM gets restarted, it first tries to open rocksDB. And during this time, rocksDB tries to do compaction, which eventually got OOM. So there are three question:
>  
> 1. Why the OM got into this status?
> 2. Why rocksDB needs so much memory to do the compaction?
> 3. How to resolve this?
> Some info maybe useful for you. We directly deploy OM HA, not migrate from one OM to HA OM. The OM that has issues is a follower, not a leader. The underlying PVC we are using is SSD. Our traffic is mostly large objects, with size of hundreds GBs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org