You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Ian Brooks <i....@sensewhere.com> on 2014/05/14 14:47:32 UTC

Constant high cpu usage on datanode process

Hi,

I have noticed recently that the datanodes on a test HDFS cluster im running are run permemnantly at 100% cpu usage after a few hours of minimal use and over time switch up to 200% CPU event when there is almost nothing happening on HDFS. 

Node 1:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                            
15607 hadoop    20   0 1746m 480m 5380 S 102.2  6.1 190:24.45 java                  

ps -ef |grep 15607

hadoop   15607     1 14 May13 ?        03:12:33 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop-hadoop-datanode-sw-hmaster-001.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode




Node 6:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                            
10809 hadoop    20   0 1750m 497m  17m S 200.0  6.3 277:07.34 java                               

ps -ef |grep 10809

hadoop   10809     1 99 09:56 ?        04:38:04 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop-hadoop-datanode-sw-hadoop-006.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dcom.sun.management.jmxremote.port=1235 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode



I though initally it might be GC but monitoring the datanode process via JMX show almost no GC activity < 1%

If I restart the datanode process then CPU usage drops back doen tot 1-5% CPU for a few hours then jumps backup to 100% again.

The version of Hadoop i'm using is hadoop-2.3.0 and the main use is for HBase which is also not handling much traffic on the test platform. There are 8 nodes in the cluster each with 8GB ram, 4GB assigned to HBase, top shows 1.9GB ram free.

Hadoop heap size is set to 1GB with useage varying between 100 and 200MB, VisualVM shows most of the CPU time to be in

org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService() - 33.5%
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run() - 23.3%
org.apache.hadoop.net.SocketIOWithTimeoujt$SelectorPool.select() - 16.9%
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks() - 12.9%


Any ideas why this would be happening, has anyone else seen this behaviour?

-Ian Brooks