You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Glen Geng (Jira)" <ji...@apache.org> on 2020/10/29 09:12:00 UTC

[jira] [Created] (HDDS-4408) Datanode State Machine Thread needs handle OutOfMemoryError

Glen Geng created HDDS-4408:
-------------------------------

             Summary: Datanode State Machine Thread needs handle OutOfMemoryError
                 Key: HDDS-4408
                 URL: https://issues.apache.org/jira/browse/HDDS-4408
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: Ozone Datanode
    Affects Versions: 1.1.0
            Reporter: Glen Geng


In Tencent production environment, we got several dead DNs which can never come back.

We found that thread "Datanode State Machine Thread - 0" does not exist in the jstack, thus no HeartbeatEndpointTask will be created, DNs will soon become dead and not recover unless being restarted.

 

After checked the .out log, we saw that OOM occurred in thread "Datanode State Machine Thread", which will kill the thread.
{code:java}
114370.799: Total time for which application threads were stopped: 1.0883622 seconds, Stopping threads took: 0.0002926 seconds
Exception in thread "Datanode State Machine Thread - 0" java.lang.OutOfMemoryError: GC overhead limit exceeded
114370.810: Application time: 0.0115941 seconds
{Heap before GC invocations=2946 (full 2680):
 PSYoungGen      total 3170304K, used 2846720K [0x00000006eab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2846720K, 100% used [0x00000006eab00000,0x0000000798700000,0x0000000798700000)
  from space 323584K, 0% used [0x00000007ac400000,0x00000007ac400000,0x00000007c0000000)
  to   space 324096K, 0% used [0x0000000798700000,0x0000000798700000,0x00000007ac380000)
 ParOldGen       total 6990848K, used 6990627K [0x0000000540000000, 0x00000006eab00000, 0x00000006eab00000)
  object space 6990848K, 99% used [0x0000000540000000,0x00000006eaac8c90,0x00000006eab00000)
 Metaspace       used 60721K, capacity 63446K, committed 64128K, reserved 1105920K
  class space    used 6583K, capacity 7031K, committed 7296K, reserved 1048576K
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org