You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Shayan Pooya <sh...@liveve.org> on 2014/06/03 18:34:36 UTC
100% CPU utilization on idle HDFS Data Nodes
I have a three node HDFS cluster with a name-node. There is absolutely no
IO going on this cluster or any jobs running on it and I just use it for
testing the Disco HDFS integration.
I noticed that two of the three data-nodes are using 100% CPU. They have
been running for a long time (2 months) but with no jobs running on them
and almost no usage:
$ hadoop version
Hadoop 2.3.0
Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
Compiled by jenkins on 2014-02-11T13:40Z
Compiled with protoc 2.5.0
>From source with checksum dfe46336fbc6a044bc124392ec06b85
Is this a known bug?
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Shayan Pooya <sh...@liveve.org>.
1. The jstack result is attached.
2. These are the two processes:
25050 shayan 20 0 1643M 87872 2512 S 100. 10.1 1434h
/usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs
-Dhadoo
25094 shayan 20 0 1643M 87872 2512 R 99.0 10.1 1422h
/usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs
-Dhadoo
3. There are not that much log activity on that node. Here is the tail:
2014-06-02 18:14:18,552 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@f81c5a2
2014-06-02 23:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 00:14:18,544 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 1 msecs for RPC and NN processing
2014-06-03 00:14:18,544 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@41f149ee
2014-06-03 05:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 06:14:18,548 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 5 msecs for RPC and NN processing
2014-06-03 06:14:18,548 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@9ebe5fa
2014-06-03 11:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 12:14:18,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 2 msecs for RPC and NN processing
2014-06-03 12:14:18,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@419662d7
On Tue, Jun 3, 2014 at 12:44 PM, Ian Brooks <i....@sensewhere.com> wrote:
> Hi Shayan,
>
> If you restart one of the datanodes, does that node go back to normal cpu
> usage? if so that looks like the same issue im seeing on my nodes, though
> mine will go to 200% over time on a 4 cpu host.
>
> I havent been able to track the cause down yet. Heavy use of HDFS will
> cause the node to jump to the 100% sooner and it stays there even when
> doing very lilttle.
>
>
>
> Ted, logs from one of my nodes to go with Shayan's
>
> 2014-06-03 17:01:28,755 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2014-06-03 17:06:41,860 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Scheduling blk_1074074152_333405 file
> /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
> for deletion
> 2014-06-03 17:06:41,871 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405
> file
> /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
> 2014-06-03 17:08:32,843 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby
> NN - ignoring command:2
> 2014-06-03 17:13:44,320 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: /
> 10.143.38.100:26618 dest: /10.143.38.100:50010
> 2014-06-03 17:13:44,351 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op:
> HDFS_WRITE, cliID:
> DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0,
> srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration:
> 3600148963947
> 2014-06-03 17:13:44,352 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2014-06-03 17:13:54,029 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op:
> HDFS_WRITE, cliID:
> DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0,
> srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration:
> 3600133166499
> 2014-06-03 17:13:54,029 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2014-06-03 17:36:17,405 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: /
> 10.143.38.149:48959 dest: /10.143.38.100:50010
>
>
> top shows
> top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93
> Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
> Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si,
> 1.4%st
> Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers
> Swap: 835576k total, 0k used, 835576k free, 2493828k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java
> 25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java
> 7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java
>
> jstack fails, im probably missing something, just not sure what
>
> sudo jstack -J-d64 -m 25450
> Attaching to process ID 25450, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 24.51-b03
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
> at sun.tools.jstack.JStack.main(JStack.java:102)
> Caused by: java.lang.RuntimeException: Unable to deduce type of thread
> from address 0x00007f0104010800 (expected type JavaThread, CompilerThread,
> ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
> at
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
> at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
> at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
> ... 6 more
> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for
> type of address 0x00007f0104010800
> at
> sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
> at
> sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
> at
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
> ... 14 more
>
>
> -Ian Brooks
>
>
> On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> > I have a three node HDFS cluster with a name-node. There is absolutely
> no
> > IO going on this cluster or any jobs running on it and I just use it for
> > testing the Disco HDFS integration.
> > I noticed that two of the three data-nodes are using 100% CPU. They have
> > been running for a long time (2 months) but with no jobs running on them
> > and almost no usage:
> >
> > $ hadoop version
> > Hadoop 2.3.0
> > Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> > Compiled by jenkins on 2014-02-11T13:40Z
> > Compiled with protoc 2.5.0
> > From source with checksum dfe46336fbc6a044bc124392ec06b85
> >
> > Is this a known bug?
> --
> -Ian Brooks
> Senior server administrator - Sensewhere
>
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Shayan Pooya <sh...@liveve.org>.
1. The jstack result is attached.
2. These are the two processes:
25050 shayan 20 0 1643M 87872 2512 S 100. 10.1 1434h
/usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs
-Dhadoo
25094 shayan 20 0 1643M 87872 2512 R 99.0 10.1 1422h
/usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs
-Dhadoo
3. There are not that much log activity on that node. Here is the tail:
2014-06-02 18:14:18,552 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@f81c5a2
2014-06-02 23:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 00:14:18,544 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 1 msecs for RPC and NN processing
2014-06-03 00:14:18,544 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@41f149ee
2014-06-03 05:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 06:14:18,548 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 5 msecs for RPC and NN processing
2014-06-03 06:14:18,548 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@9ebe5fa
2014-06-03 11:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 12:14:18,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 2 msecs for RPC and NN processing
2014-06-03 12:14:18,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@419662d7
On Tue, Jun 3, 2014 at 12:44 PM, Ian Brooks <i....@sensewhere.com> wrote:
> Hi Shayan,
>
> If you restart one of the datanodes, does that node go back to normal cpu
> usage? if so that looks like the same issue im seeing on my nodes, though
> mine will go to 200% over time on a 4 cpu host.
>
> I havent been able to track the cause down yet. Heavy use of HDFS will
> cause the node to jump to the 100% sooner and it stays there even when
> doing very lilttle.
>
>
>
> Ted, logs from one of my nodes to go with Shayan's
>
> 2014-06-03 17:01:28,755 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2014-06-03 17:06:41,860 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Scheduling blk_1074074152_333405 file
> /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
> for deletion
> 2014-06-03 17:06:41,871 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405
> file
> /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
> 2014-06-03 17:08:32,843 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby
> NN - ignoring command:2
> 2014-06-03 17:13:44,320 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: /
> 10.143.38.100:26618 dest: /10.143.38.100:50010
> 2014-06-03 17:13:44,351 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op:
> HDFS_WRITE, cliID:
> DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0,
> srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration:
> 3600148963947
> 2014-06-03 17:13:44,352 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2014-06-03 17:13:54,029 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op:
> HDFS_WRITE, cliID:
> DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0,
> srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration:
> 3600133166499
> 2014-06-03 17:13:54,029 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2014-06-03 17:36:17,405 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: /
> 10.143.38.149:48959 dest: /10.143.38.100:50010
>
>
> top shows
> top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93
> Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
> Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si,
> 1.4%st
> Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers
> Swap: 835576k total, 0k used, 835576k free, 2493828k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java
> 25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java
> 7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java
>
> jstack fails, im probably missing something, just not sure what
>
> sudo jstack -J-d64 -m 25450
> Attaching to process ID 25450, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 24.51-b03
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
> at sun.tools.jstack.JStack.main(JStack.java:102)
> Caused by: java.lang.RuntimeException: Unable to deduce type of thread
> from address 0x00007f0104010800 (expected type JavaThread, CompilerThread,
> ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
> at
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
> at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
> at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
> ... 6 more
> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for
> type of address 0x00007f0104010800
> at
> sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
> at
> sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
> at
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
> ... 14 more
>
>
> -Ian Brooks
>
>
> On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> > I have a three node HDFS cluster with a name-node. There is absolutely
> no
> > IO going on this cluster or any jobs running on it and I just use it for
> > testing the Disco HDFS integration.
> > I noticed that two of the three data-nodes are using 100% CPU. They have
> > been running for a long time (2 months) but with no jobs running on them
> > and almost no usage:
> >
> > $ hadoop version
> > Hadoop 2.3.0
> > Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> > Compiled by jenkins on 2014-02-11T13:40Z
> > Compiled with protoc 2.5.0
> > From source with checksum dfe46336fbc6a044bc124392ec06b85
> >
> > Is this a known bug?
> --
> -Ian Brooks
> Senior server administrator - Sensewhere
>
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Shayan Pooya <sh...@liveve.org>.
1. The jstack result is attached.
2. These are the two processes:
25050 shayan 20 0 1643M 87872 2512 S 100. 10.1 1434h
/usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs
-Dhadoo
25094 shayan 20 0 1643M 87872 2512 R 99.0 10.1 1422h
/usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs
-Dhadoo
3. There are not that much log activity on that node. Here is the tail:
2014-06-02 18:14:18,552 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@f81c5a2
2014-06-02 23:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 00:14:18,544 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 1 msecs for RPC and NN processing
2014-06-03 00:14:18,544 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@41f149ee
2014-06-03 05:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 06:14:18,548 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 5 msecs for RPC and NN processing
2014-06-03 06:14:18,548 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@9ebe5fa
2014-06-03 11:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 12:14:18,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 2 msecs for RPC and NN processing
2014-06-03 12:14:18,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@419662d7
On Tue, Jun 3, 2014 at 12:44 PM, Ian Brooks <i....@sensewhere.com> wrote:
> Hi Shayan,
>
> If you restart one of the datanodes, does that node go back to normal cpu
> usage? if so that looks like the same issue im seeing on my nodes, though
> mine will go to 200% over time on a 4 cpu host.
>
> I havent been able to track the cause down yet. Heavy use of HDFS will
> cause the node to jump to the 100% sooner and it stays there even when
> doing very lilttle.
>
>
>
> Ted, logs from one of my nodes to go with Shayan's
>
> 2014-06-03 17:01:28,755 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2014-06-03 17:06:41,860 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Scheduling blk_1074074152_333405 file
> /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
> for deletion
> 2014-06-03 17:06:41,871 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405
> file
> /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
> 2014-06-03 17:08:32,843 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby
> NN - ignoring command:2
> 2014-06-03 17:13:44,320 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: /
> 10.143.38.100:26618 dest: /10.143.38.100:50010
> 2014-06-03 17:13:44,351 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op:
> HDFS_WRITE, cliID:
> DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0,
> srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration:
> 3600148963947
> 2014-06-03 17:13:44,352 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2014-06-03 17:13:54,029 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op:
> HDFS_WRITE, cliID:
> DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0,
> srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration:
> 3600133166499
> 2014-06-03 17:13:54,029 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2014-06-03 17:36:17,405 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: /
> 10.143.38.149:48959 dest: /10.143.38.100:50010
>
>
> top shows
> top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93
> Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
> Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si,
> 1.4%st
> Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers
> Swap: 835576k total, 0k used, 835576k free, 2493828k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java
> 25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java
> 7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java
>
> jstack fails, im probably missing something, just not sure what
>
> sudo jstack -J-d64 -m 25450
> Attaching to process ID 25450, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 24.51-b03
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
> at sun.tools.jstack.JStack.main(JStack.java:102)
> Caused by: java.lang.RuntimeException: Unable to deduce type of thread
> from address 0x00007f0104010800 (expected type JavaThread, CompilerThread,
> ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
> at
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
> at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
> at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
> ... 6 more
> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for
> type of address 0x00007f0104010800
> at
> sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
> at
> sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
> at
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
> ... 14 more
>
>
> -Ian Brooks
>
>
> On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> > I have a three node HDFS cluster with a name-node. There is absolutely
> no
> > IO going on this cluster or any jobs running on it and I just use it for
> > testing the Disco HDFS integration.
> > I noticed that two of the three data-nodes are using 100% CPU. They have
> > been running for a long time (2 months) but with no jobs running on them
> > and almost no usage:
> >
> > $ hadoop version
> > Hadoop 2.3.0
> > Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> > Compiled by jenkins on 2014-02-11T13:40Z
> > Compiled with protoc 2.5.0
> > From source with checksum dfe46336fbc6a044bc124392ec06b85
> >
> > Is this a known bug?
> --
> -Ian Brooks
> Senior server administrator - Sensewhere
>
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Shayan Pooya <sh...@liveve.org>.
1. The jstack result is attached.
2. These are the two processes:
25050 shayan 20 0 1643M 87872 2512 S 100. 10.1 1434h
/usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs
-Dhadoo
25094 shayan 20 0 1643M 87872 2512 R 99.0 10.1 1422h
/usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m
-Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs
-Dhadoo
3. There are not that much log activity on that node. Here is the tail:
2014-06-02 18:14:18,552 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@f81c5a2
2014-06-02 23:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 00:14:18,544 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 1 msecs for RPC and NN processing
2014-06-03 00:14:18,544 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@41f149ee
2014-06-03 05:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 06:14:18,548 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 5 msecs for RPC and NN processing
2014-06-03 06:14:18,548 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@9ebe5fa
2014-06-03 11:36:44,764 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata
files:0, missing block files:0, missing blocks in memory:0, mismatched
blocks:0
2014-06-03 12:14:18,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks
took 0 msec to generate and 2 msecs for RPC and NN processing
2014-06-03 12:14:18,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report,
processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@419662d7
On Tue, Jun 3, 2014 at 12:44 PM, Ian Brooks <i....@sensewhere.com> wrote:
> Hi Shayan,
>
> If you restart one of the datanodes, does that node go back to normal cpu
> usage? if so that looks like the same issue im seeing on my nodes, though
> mine will go to 200% over time on a 4 cpu host.
>
> I havent been able to track the cause down yet. Heavy use of HDFS will
> cause the node to jump to the 100% sooner and it stays there even when
> doing very lilttle.
>
>
>
> Ted, logs from one of my nodes to go with Shayan's
>
> 2014-06-03 17:01:28,755 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2014-06-03 17:06:41,860 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Scheduling blk_1074074152_333405 file
> /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
> for deletion
> 2014-06-03 17:06:41,871 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405
> file
> /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
> 2014-06-03 17:08:32,843 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby
> NN - ignoring command:2
> 2014-06-03 17:13:44,320 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: /
> 10.143.38.100:26618 dest: /10.143.38.100:50010
> 2014-06-03 17:13:44,351 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op:
> HDFS_WRITE, cliID:
> DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0,
> srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration:
> 3600148963947
> 2014-06-03 17:13:44,352 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2014-06-03 17:13:54,029 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op:
> HDFS_WRITE, cliID:
> DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0,
> srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration:
> 3600133166499
> 2014-06-03 17:13:54,029 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2014-06-03 17:36:17,405 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
> BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: /
> 10.143.38.149:48959 dest: /10.143.38.100:50010
>
>
> top shows
> top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93
> Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
> Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si,
> 1.4%st
> Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers
> Swap: 835576k total, 0k used, 835576k free, 2493828k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java
> 25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java
> 7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java
>
> jstack fails, im probably missing something, just not sure what
>
> sudo jstack -J-d64 -m 25450
> Attaching to process ID 25450, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 24.51-b03
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
> at sun.tools.jstack.JStack.main(JStack.java:102)
> Caused by: java.lang.RuntimeException: Unable to deduce type of thread
> from address 0x00007f0104010800 (expected type JavaThread, CompilerThread,
> ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
> at
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
> at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
> at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
> at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
> ... 6 more
> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for
> type of address 0x00007f0104010800
> at
> sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
> at
> sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
> at
> sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
> ... 14 more
>
>
> -Ian Brooks
>
>
> On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> > I have a three node HDFS cluster with a name-node. There is absolutely
> no
> > IO going on this cluster or any jobs running on it and I just use it for
> > testing the Disco HDFS integration.
> > I noticed that two of the three data-nodes are using 100% CPU. They have
> > been running for a long time (2 months) but with no jobs running on them
> > and almost no usage:
> >
> > $ hadoop version
> > Hadoop 2.3.0
> > Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> > Compiled by jenkins on 2014-02-11T13:40Z
> > Compiled with protoc 2.5.0
> > From source with checksum dfe46336fbc6a044bc124392ec06b85
> >
> > Is this a known bug?
> --
> -Ian Brooks
> Senior server administrator - Sensewhere
>
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Ian Brooks <i....@sensewhere.com>.
Hi Shayan,
If you restart one of the datanodes, does that node go back to normal cpu usage? if so that looks like the same issue im seeing on my nodes, though mine will go to 200% over time on a 4 cpu host.
I havent been able to track the cause down yet. Heavy use of HDFS will cause the node to jump to the 100% sooner and it stays there even when doing very lilttle.
Ted, logs from one of my nodes to go with Shayan's
2014-06-03 17:01:28,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2014-06-03 17:06:41,860 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152 for deletion
2014-06-03 17:06:41,871 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
2014-06-03 17:08:32,843 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby NN - ignoring command:2
2014-06-03 17:13:44,320 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: /10.143.38.100:26618 dest: /10.143.38.100:50010
2014-06-03 17:13:44,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op: HDFS_WRITE, cliID: DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration: 3600148963947
2014-06-03 17:13:44,352 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op: HDFS_WRITE, cliID: DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration: 3600133166499
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2014-06-03 17:36:17,405 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: /10.143.38.149:48959 dest: /10.143.38.100:50010
top shows
top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93
Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si, 1.4%st
Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers
Swap: 835576k total, 0k used, 835576k free, 2493828k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java
25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java
7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java
jstack fails, im probably missing something, just not sure what
sudo jstack -J-d64 -m 25450
Attaching to process ID 25450, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007f0104010800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
... 6 more
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007f0104010800
at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
... 14 more
-Ian Brooks
On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> I have a three node HDFS cluster with a name-node. There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU. They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
>
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
>
> Is this a known bug?
--
-Ian Brooks
Senior server administrator - Sensewhere
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin data node log snippet and jstack of datanode process ?
Thanks
On Tue, Jun 3, 2014 at 9:34 AM, Shayan Pooya <sh...@liveve.org> wrote:
> I have a three node HDFS cluster with a name-node. There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU. They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
>
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
>
> Is this a known bug?
>
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Ian Brooks <i....@sensewhere.com>.
Hi Shayan,
If you restart one of the datanodes, does that node go back to normal cpu usage? if so that looks like the same issue im seeing on my nodes, though mine will go to 200% over time on a 4 cpu host.
I havent been able to track the cause down yet. Heavy use of HDFS will cause the node to jump to the 100% sooner and it stays there even when doing very lilttle.
Ted, logs from one of my nodes to go with Shayan's
2014-06-03 17:01:28,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2014-06-03 17:06:41,860 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152 for deletion
2014-06-03 17:06:41,871 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
2014-06-03 17:08:32,843 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby NN - ignoring command:2
2014-06-03 17:13:44,320 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: /10.143.38.100:26618 dest: /10.143.38.100:50010
2014-06-03 17:13:44,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op: HDFS_WRITE, cliID: DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration: 3600148963947
2014-06-03 17:13:44,352 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op: HDFS_WRITE, cliID: DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration: 3600133166499
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2014-06-03 17:36:17,405 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: /10.143.38.149:48959 dest: /10.143.38.100:50010
top shows
top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93
Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si, 1.4%st
Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers
Swap: 835576k total, 0k used, 835576k free, 2493828k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java
25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java
7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java
jstack fails, im probably missing something, just not sure what
sudo jstack -J-d64 -m 25450
Attaching to process ID 25450, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007f0104010800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
... 6 more
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007f0104010800
at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
... 14 more
-Ian Brooks
On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> I have a three node HDFS cluster with a name-node. There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU. They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
>
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
>
> Is this a known bug?
--
-Ian Brooks
Senior server administrator - Sensewhere
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin data node log snippet and jstack of datanode process ?
Thanks
On Tue, Jun 3, 2014 at 9:34 AM, Shayan Pooya <sh...@liveve.org> wrote:
> I have a three node HDFS cluster with a name-node. There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU. They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
>
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
>
> Is this a known bug?
>
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Ian Brooks <i....@sensewhere.com>.
Hi Shayan,
If you restart one of the datanodes, does that node go back to normal cpu usage? if so that looks like the same issue im seeing on my nodes, though mine will go to 200% over time on a 4 cpu host.
I havent been able to track the cause down yet. Heavy use of HDFS will cause the node to jump to the 100% sooner and it stays there even when doing very lilttle.
Ted, logs from one of my nodes to go with Shayan's
2014-06-03 17:01:28,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2014-06-03 17:06:41,860 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152 for deletion
2014-06-03 17:06:41,871 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
2014-06-03 17:08:32,843 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby NN - ignoring command:2
2014-06-03 17:13:44,320 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: /10.143.38.100:26618 dest: /10.143.38.100:50010
2014-06-03 17:13:44,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op: HDFS_WRITE, cliID: DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration: 3600148963947
2014-06-03 17:13:44,352 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op: HDFS_WRITE, cliID: DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration: 3600133166499
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2014-06-03 17:36:17,405 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: /10.143.38.149:48959 dest: /10.143.38.100:50010
top shows
top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93
Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si, 1.4%st
Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers
Swap: 835576k total, 0k used, 835576k free, 2493828k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java
25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java
7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java
jstack fails, im probably missing something, just not sure what
sudo jstack -J-d64 -m 25450
Attaching to process ID 25450, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007f0104010800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
... 6 more
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007f0104010800
at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
... 14 more
-Ian Brooks
On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> I have a three node HDFS cluster with a name-node. There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU. They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
>
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
>
> Is this a known bug?
--
-Ian Brooks
Senior server administrator - Sensewhere
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Ian Brooks <i....@sensewhere.com>.
Hi Shayan,
If you restart one of the datanodes, does that node go back to normal cpu usage? if so that looks like the same issue im seeing on my nodes, though mine will go to 200% over time on a 4 cpu host.
I havent been able to track the cause down yet. Heavy use of HDFS will cause the node to jump to the 100% sooner and it stays there even when doing very lilttle.
Ted, logs from one of my nodes to go with Shayan's
2014-06-03 17:01:28,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2014-06-03 17:06:41,860 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152 for deletion
2014-06-03 17:06:41,871 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
2014-06-03 17:08:32,843 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby NN - ignoring command:2
2014-06-03 17:13:44,320 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: /10.143.38.100:26618 dest: /10.143.38.100:50010
2014-06-03 17:13:44,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op: HDFS_WRITE, cliID: DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration: 3600148963947
2014-06-03 17:13:44,352 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op: HDFS_WRITE, cliID: DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration: 3600133166499
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2014-06-03 17:36:17,405 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: /10.143.38.149:48959 dest: /10.143.38.100:50010
top shows
top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93
Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie
Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si, 1.4%st
Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers
Swap: 835576k total, 0k used, 835576k free, 2493828k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java
25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java
7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java
jstack fails, im probably missing something, just not sure what
sudo jstack -J-d64 -m 25450
Attaching to process ID 25450, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007f0104010800 (expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
... 6 more
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007f0104010800
at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
... 14 more
-Ian Brooks
On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> I have a three node HDFS cluster with a name-node. There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU. They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
>
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
>
> Is this a known bug?
--
-Ian Brooks
Senior server administrator - Sensewhere
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin data node log snippet and jstack of datanode process ?
Thanks
On Tue, Jun 3, 2014 at 9:34 AM, Shayan Pooya <sh...@liveve.org> wrote:
> I have a three node HDFS cluster with a name-node. There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU. They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
>
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
>
> Is this a known bug?
>
Re: 100% CPU utilization on idle HDFS Data Nodes
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin data node log snippet and jstack of datanode process ?
Thanks
On Tue, Jun 3, 2014 at 9:34 AM, Shayan Pooya <sh...@liveve.org> wrote:
> I have a three node HDFS cluster with a name-node. There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU. They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
>
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
>
> Is this a known bug?
>