You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by David Ginzburg <gi...@hotmail.com> on 2011/06/30 19:01:57 UTC
Dead data nodes during job excution and failed tasks.
Hi,
I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).
This causes some tasks to fail. my max fd limit is 64k.
Can any one identify what is causing this?
here are logs from the failed task and corresponding Data node log snippets:
Slave running the map task:
2011-06-30 18:49:00,338 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2011-06-30 18:49:00,583 WARN org.apache.hadoop.conf.Configuration:
/mnt1/tmp/hadoop-0.20/cache/root/mapred/local/taskTracker/jobcache/job_201106290946_0605/attempt_201106290946_0605_m_004201_0/job.xml:a
attempt to override final parameter: dfs.hosts.exclude;
Ignoring.
2011-06-30 18:50:00,845 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-3550309578268660022_44008639 from any node: java.io.IOException: No live nodes contain current block
2011-06-30 18:51:03,865 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_-3550309578268660022_44008639 from any node:
java.io.IOException: No live nodes contain current block
2011-06-30 18:52:07,003 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_-3550309578268660022_44008639 from any node:
java.io.IOException: No live nodes contain current block
2011-06-30 18:54:10,075 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
java.io.IOException: Could not obtain block:
blk_-3550309578268660022_44008639
file=/user/root/user_login_log/distinct_data/20110629/part-00124
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1797)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1623)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1752)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-06-30 18:54:12,576 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.io.IOException: Could not obtain block:
blk_-3550309578268660022_44008639
file=/user/root/user_login_log/distinct_data/20110629/part-00124
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1797)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1623)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1752)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-06-30 18:54:14,635 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task
DataNode on slave25
2011-06-30 18:52:38,126 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.119:48947, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_-3550309578268660022_44008639, duration: 146739000
2011-06-30 18:52:38,126 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.119:55630, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_-3550309578268660022_44008639, duration: 147680000
2011-06-30 18:52:38,126 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.119:48589, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_-3550309578268660022_44008639, duration: 147021000
2011-06-30 18:52:38,126 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.119:49278, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_-3550309578268660022_44008639, duration: 137582000
2011-06-30 18:52:38,135 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.138:47818, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004252_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_341656939579646802_43116464, duration: 30340000
2011-06-30 18:52:38,184 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.115:51798, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201106290946_0605_m_004932_0, offset: 0,
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
blockid: blk_1807082107808304792_41330622, duration: 195543000
2011-06-30 18:52:38,212 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
receiveBlock for block blk_-7111600897613399673_44069484
java.io.EOFException: while trying to read 65557 bytes
2011-06-30 18:52:38,212 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
receiveBlock for block blk_-3883253381457682039_44069367
java.io.EOFException: while trying to read 65557 bytes
2011-06-30 18:52:38,215 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-3883253381457682039_44069367 1 Exception
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,215
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-7111600897613399673_44069484 1 Exception
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,215
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-3883253381457682039_44069367 1 : Thread is interrupted.
2011-06-30 18:52:38,215 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-7111600897613399673_44069484 1 : Thread is interrupted.
2011-06-30 18:52:38,215 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for
block blk_-3883253381457682039_44069367 terminating
2011-06-30 18:52:38,216 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for
block blk_-7111600897613399673_44069484 terminating
2011-06-30 18:52:38,216 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-3883253381457682039_44069367 received exception
java.io.EOFException: while trying
to read 65557 bytes
2011-06-30 18:52:38,216 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-7111600897613399673_44069484 received exception
java.io.EOFException: while trying
to read 65557 bytes
2011-06-30 18:52:38,216 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361
, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,216 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361
, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,223 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
receiveBlock for block blk_-9189210190114502992_32988083
java.io.EOFException: while
trying to read 2730 bytes
2011-06-30 18:52:38,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
receiveBlock for block blk_-494032072952213794_44069458
java.io.EOFException: while t
rying to read 65557 bytes
2011-06-30 18:52:38,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-494032072952213794_44069458 2 Exception
java.nio.channels.ClosedByInterruptEx
ception
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,305
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_-494032072952213794_44069458 2 : Thread is interrupted.
2011-06-30 18:52:38,305 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
block blk_-494032072952213794_44069458 terminating
2011-06-30 18:52:38,306 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-494032072952213794_44069458 received exception
java.io.EOFException: while trying
to read 65557 bytes
2011-06-30 18:52:38,306 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361
, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,471 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Block
blk_-9220320572849103952_19298441 unfinalized and removed.
2011-06-30 18:52:38,471 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.142:53133, bytes: 0, op: HDFS_READ,
cliID: D
FSClient_attempt_201106290946_0608_m_000004_1, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_6335845677046126634_44068668, duration: 296000
2011-06-30 18:52:38,472 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-9220320572849103952_19298441 received exception
java.io.EOFException: while trying
to read 25370 bytes
2011-06-30 18:52:38,472 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361
, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 25370 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,472 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-9080487810231223837_12934481 src: /172.28.1.128:48816 dest:
/172.28.1.125:5001
0 of size 934228
2011-06-30 18:52:38,472 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-9047841941671274596_33117184 src: /172.28.1.139:57972 dest:
/172.28.1.125:5001
0 of size 436507
2011-06-30 18:52:38,473 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Block
blk_-9189210190114502992_32988083 unfinalized and removed.
2011-06-30 18:52:38,473 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-9189210190114502992_32988083 received exception
java.io.EOFException: while trying
to read 2730 bytes
2011-06-30 18:52:38,473 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361
, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 2730 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,473 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-9167383395209195324_43113407 src: /172.28.1.129:42479 dest:
/172.28.1.125:5001
0 of size 306521
2011-06-30 18:52:38,474 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Block
blk_-9222760032390209760_33640525 unfinalized and removed.
2011-06-30 18:52:38,474 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-9222760032390209760_33640525 received exception
java.io.EOFException: while trying
to read 5034 bytes
2011-06-30 18:52:38,474 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(172.28.1.125:50010,
storageID=DS-2025332107-172.28.1.125-50010-1300893119361
, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 5034 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)
2011-06-30 18:52:38,475 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-4835040186380340610_22556835 src: /172.28.1.129:42385 dest:
/172.28.1.125:5001
0 of size 1873718
2011-06-30 18:52:38,500 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.103:48797, bytes: 0, op: HDFS_READ,
cliID: D
FSClient_attempt_201106220308_2593_r_000103_0, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_-7869815404835809891_44068763, duration: 26414000
2011-06-30 18:52:38,543 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-8830898802917406976_9670906 src: /172.28.1.135:38108 dest:
/172.28.1.125:50010
of size 1586179
2011-06-30 18:52:38,545 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.108:49754, bytes: 0, op: HDFS_READ,
cliID: D
FSClient_attempt_201106290946_0608_m_000117_1, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_8222968190463524631_44068718, duration: 71801000
2011-06-30 18:52:38,557 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.118:34031, bytes: 0, op: HDFS_READ,
cliID: D
FSClient_attempt_201106290946_0605_m_005176_0, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_1403998834571958702_41330639, duration: 85592000
2011-06-30 18:52:38,587 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-3977598016817417325_33983630 src: /172.28.1.107:59369 dest:
/172.28.1.125:5001
0 of size 5787499
2011-06-30 18:52:38,614 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.138:47611, bytes: 0, op: HDFS_READ,
cliID: D
FSClient_attempt_201106290946_0605_m_004132_0, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_-2960100665116847711_42689744, duration: 141415000
2011-06-30 18:52:38,644 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/172.28.1.125:50010, dest: /172.28.1.112:55765, bytes: 0, op: HDFS_READ,
cliID: D
FSClient_attempt_201106290946_0608_m_000303_1, offset: 0, srvID:
DS-2025332107-172.28.1.125-50010-1300893119361, blockid:
blk_8063669434333395104_44065929, duration: 1272000
2011-06-30 18:52:38,655 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_3068907439054059419_27968064
2011-06-30 18:52:38,914 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_-8758412301877668079_31392851 src: /172.28.1.139:60920 dest:
/172.28.1.125:5001
0 of size 3166291
:
Re: Dead data nodes during job excution and failed tasks.
Posted by Allen Wittenauer <aw...@apache.org>.
On Jun 30, 2011, at 12:36 PM, David Ginzburg wrote:
>
> Is it possible though the server runs with vm.swappiness =5
That only controls how aggressive the system swaps. If you eat all the RAM in user space, the system is going to start paging memory regardless of swappiness.
RE: Dead data nodes during job excution and failed tasks.
Posted by David Ginzburg <gi...@hotmail.com>.
Is it possible though the server runs with vm.swappiness =5
> Subject: Re: Dead data nodes during job excution and failed tasks.
> From: aw@apache.org
> Date: Thu, 30 Jun 2011 11:46:25 -0700
> To: mapreduce-user@hadoop.apache.org
>
>
> On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote:
>
> >
> > Hi,
> > I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).
>
> Check your memory usage during the job run. Chances are good the DataNode is getting swapped out.
Re: Dead data nodes during job excution and failed tasks.
Posted by Allen Wittenauer <aw...@apache.org>.
On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote:
>
> Hi,
> I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).
Check your memory usage during the job run. Chances are good the DataNode is getting swapped out.