You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by yaoxiaohua <ya...@outlook.com> on 2015/12/17 03:36:20 UTC

NodeManager crash with exception and oom urgent

Hi guys,

                Environment:Hadoop2.3-cdh5.0.2, I have a cluster about sixty
nodes. Nn1,nn2 are ha namenodes.  Dn1-dn58 are data
nodes(datanode,nodemanager).

Now one datanode 's nodemanager always crash after executing some containers
,sometimes after some hours ,sometimes some minutes.

 

Configuration are same with other datanodes. Kernel paramer are not, because
I am tunning  for this issue.

 

I have spent a lot of time to investigate this issue, but have no solution.
This drives me crazy.

I tune some Linux kernel parameter:

vm.overcommit_memory=1

vm.swappiness = 20

#for dmesg  page allocate failure 

vm.zone_reclaim_mode = 1

vm.min_free_kbytes = 65536 

 

I also change nodemanager's gc policy from gencon to optthruput;

 

 

Process log:

2015-12-16 17:24:39,663 ERROR org.apache.hadoop.mapred.ShuffleHandler:
Shuffle error [id: 0x34ed4c97, /172.19.206.148:34641 =>
/172.19.206.142:8080] EXCEPTII

ON: java.lang.ArrayIndexOutOfBoundsException

2015-12-16 17:24:39,663 FATAL
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Container
Monitor,5,main] threw an Error.  Shutting down noo

w...

java.lang.OutOfMemoryError: Java heapspace

        at java.util.HashMap.inflateTable(HashMap.java:328)

        at java.util.HashMap.<init>(HashMap.java:308)

        at
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsB
asedProcessTree.java:154)

        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Container
sMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:390)

2015-12-16 17:24:39,666 INFO org.apache.hadoop.util.ExitUtil: Halt with
status -1 Message: HaltException

 

Before this there are always some errors like this:

2015-12-16 17:24:35,336 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Container
sMonitorImpl: Memory usage of ProcessTree 19947 forr

container-id container_1448915696877_23390_01_000037: 102.6 MB of 2 GB
physical memory used; 2.1 GB of 4.2 GB virtual memory used

2015-12-16 17:24:38,379 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Container
sMonitorImpl: Uncaught exception in ContainerMemoryy

Manager while managing memory of container_1448915696877_23390_01_000543

java.lang.IllegalArgumentException: disparate values

        at sun.misc.FDBigInt.quoRemIteration(FloatingDecimal.java:2931)

        at
sun.misc.FormattedFloatingDecimal.dtoa(FormattedFloatingDecimal.java:922)

        at
sun.misc.FormattedFloatingDecimal.<init>(FormattedFloatingDecimal.java:542)

        at java.util.Formatter$FormatSpecifier.print(Formatter.java:3264)

        at java.util.Formatter$FormatSpecifier.print(Formatter.java:3202)

        at
java.util.Formatter$FormatSpecifier.printFloat(Formatter.java:2769)

        at java.util.Formatter$FormatSpecifier.print(Formatter.java:2720)

        at java.util.Formatter.format(Formatter.java:2500)

        at java.util.Formatter.format(Formatter.java:2435)

        at java.lang.String.format(String.java:2148)

        at org.apache.hadoop.util.StringUtils.format(StringUtils.java:123)

        at
org.apache.hadoop.util.StringUtils$TraditionalBinaryPrefix.long2String(Strin
gUtils.java:758)

        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Container
sMonitorImpl$MonitoringThread.formatUsageString(ContainersMonitorImpll

.java:487)

        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Container
sMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:399)

 

2015-12-16 17:24:38,516 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Container
sMonitorImpl: Uncaught exception in ContainerMemoryy

Manager while managing memory of container_1448915696877_23390_01_000374

java.lang.ArrayIndexOutOfBoundsException

        at
sun.misc.FormattedFloatingDecimal.dtoa(FormattedFloatingDecimal.java:848)

        at
sun.misc.FormattedFloatingDecimal.<init>(FormattedFloatingDecimal.java:542)

        at java.util.Formatter$FormatSpecifier.print(Formatter.java:3264)

        at java.util.Formatter$FormatSpecifier.print(Formatter.java:3202)

        at
java.util.Formatter$FormatSpecifier.printFloat(Formatter.java:2769)

        at java.util.Formatter$FormatSpecifier.print(Formatter.java:2720)

        at java.util.Formatter.format(Formatter.java:2500)

 

 

 

Best Regards,

Evan Yao