You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by David Ginzburg <gi...@hotmail.com> on 2011/06/30 19:01:57 UTC

Dead data nodes during job excution and failed tasks.

Hi,
I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).

This causes some tasks to fail. my max fd limit is 64k. 

Can any one identify what is causing this?

here are logs from the failed task and corresponding Data node log snippets:


Slave running the map task:


2011-06-30 18:49:00,338 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=

2011-06-30 18:49:00,583 WARN org.apache.hadoop.conf.Configuration: 
/mnt1/tmp/hadoop-0.20/cache/root/mapred/local/taskTracker/jobcache/job_201106290946_0605/attempt_201106290946_0605_m_004201_0/job.xml:a
 attempt to override final parameter: dfs.hosts.exclude; 
 Ignoring.

2011-06-30 18:50:00,845 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-3550309578268660022_44008639 from any node:  java.io.IOException: No live nodes contain current block

2011-06-30 18:51:03,865 INFO org.apache.hadoop.hdfs.DFSClient: Could not
 obtain block blk_-3550309578268660022_44008639 from any node:  
java.io.IOException: No live nodes contain current block

2011-06-30 18:52:07,003 INFO org.apache.hadoop.hdfs.DFSClient: Could not
 obtain block blk_-3550309578268660022_44008639 from any node:  
java.io.IOException: No live nodes contain current block

2011-06-30 18:54:10,075 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
 java.io.IOException: Could not obtain block: 
blk_-3550309578268660022_44008639 
file=/user/root/user_login_log/distinct_data/20110629/part-00124

                at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1797)

                at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1623)

                at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1752)

                at java.io.DataInputStream.readFully(DataInputStream.java:178)

                at java.io.DataInputStream.readFully(DataInputStream.java:152)

                at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)

                at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)

                at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)

                at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)

                at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)

                at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)

                at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)

                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

                at org.apache.hadoop.mapred.Child.main(Child.java:170)

 

2011-06-30 18:54:12,576 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

java.io.IOException: Could not obtain block: 
blk_-3550309578268660022_44008639 
file=/user/root/user_login_log/distinct_data/20110629/part-00124

                at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1797)

                at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1623)

                at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1752)

                at java.io.DataInputStream.readFully(DataInputStream.java:178)

                at java.io.DataInputStream.readFully(DataInputStream.java:152)

                at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)

                at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)

                at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)

                at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)

                at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)

                at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)

                at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)

                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

                at org.apache.hadoop.mapred.Child.main(Child.java:170)

2011-06-30 18:54:14,635 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task

 

 

 

DataNode on slave25

 

 

2011-06-30 18:52:38,126 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.119:48947, bytes: 0, op: HDFS_READ,
 cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0, 
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
 blockid: blk_-3550309578268660022_44008639, duration: 146739000

2011-06-30 18:52:38,126 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.119:55630, bytes: 0, op: HDFS_READ,
 cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0, 
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
 blockid: blk_-3550309578268660022_44008639, duration: 147680000

2011-06-30 18:52:38,126 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.119:48589, bytes: 0, op: HDFS_READ,
 cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0, 
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
 blockid: blk_-3550309578268660022_44008639, duration: 147021000

2011-06-30 18:52:38,126 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.119:49278, bytes: 0, op: HDFS_READ,
 cliID: DFSClient_attempt_201106290946_0605_m_004201_0, offset: 0, 
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
 blockid: blk_-3550309578268660022_44008639, duration: 137582000

2011-06-30 18:52:38,135 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.138:47818, bytes: 0, op: HDFS_READ,
 cliID: DFSClient_attempt_201106290946_0605_m_004252_0, offset: 0, 
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
 blockid: blk_341656939579646802_43116464, duration: 30340000

2011-06-30 18:52:38,184 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.115:51798, bytes: 0, op: HDFS_READ,
 cliID: DFSClient_attempt_201106290946_0605_m_004932_0, offset: 0, 
srvID: DS-2025332107-172.28.1.125-50010-1300893119361,
 blockid: blk_1807082107808304792_41330622, duration: 195543000

2011-06-30 18:52:38,212 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in 
receiveBlock for block blk_-7111600897613399673_44069484 
java.io.EOFException: while trying to read 65557 bytes

2011-06-30 18:52:38,212 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in 
receiveBlock for block blk_-3883253381457682039_44069367 
java.io.EOFException: while trying to read 65557 bytes

2011-06-30 18:52:38,215 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 
blk_-3883253381457682039_44069367 1 Exception 
java.nio.channels.ClosedByInterruptException

        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)

        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

        at java.io.DataInputStream.readFully(DataInputStream.java:178)

        at java.io.DataInputStream.readLong(DataInputStream.java:399)

        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)

        at java.lang.Thread.run(Thread.java:619)

 

2011-06-30 18:52:38,215 
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 
blk_-7111600897613399673_44069484 1 Exception 
java.nio.channels.ClosedByInterruptException

        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)

        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

        at java.io.DataInputStream.readFully(DataInputStream.java:178)

        at java.io.DataInputStream.readLong(DataInputStream.java:399)

        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)

        at java.lang.Thread.run(Thread.java:619)

 

2011-06-30 18:52:38,215 
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 
blk_-3883253381457682039_44069367 1 : Thread is interrupted.

2011-06-30 18:52:38,215 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 
blk_-7111600897613399673_44069484 1 : Thread is interrupted.

2011-06-30 18:52:38,215 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for 
block blk_-3883253381457682039_44069367 terminating

2011-06-30 18:52:38,216 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for 
block blk_-7111600897613399673_44069484 terminating

2011-06-30 18:52:38,216 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
blk_-3883253381457682039_44069367 received exception 
java.io.EOFException: while trying

to read 65557 bytes

2011-06-30 18:52:38,216 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
blk_-7111600897613399673_44069484 received exception 
java.io.EOFException: while trying

to read 65557 bytes

2011-06-30 18:52:38,216 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.28.1.125:50010, 
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 65557 bytes

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

        at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,216 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.28.1.125:50010, 
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 65557 bytes

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

        at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,223 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in 
receiveBlock for block blk_-9189210190114502992_32988083 
java.io.EOFException: while


trying to read 2730 bytes

2011-06-30 18:52:38,305 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in 
receiveBlock for block blk_-494032072952213794_44069458 
java.io.EOFException: while t

rying to read 65557 bytes

2011-06-30 18:52:38,305 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 
blk_-494032072952213794_44069458 2 Exception 
java.nio.channels.ClosedByInterruptEx

ception

        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)

        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

        at java.io.DataInputStream.readFully(DataInputStream.java:178)

        at java.io.DataInputStream.readLong(DataInputStream.java:399)

        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:882)

        at java.lang.Thread.run(Thread.java:619)

 

2011-06-30 18:52:38,305 
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 
blk_-494032072952213794_44069458 2 : Thread is interrupted.

2011-06-30 18:52:38,305 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for 
block blk_-494032072952213794_44069458 terminating

2011-06-30 18:52:38,306 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
blk_-494032072952213794_44069458 received exception 
java.io.EOFException: while trying


to read 65557 bytes

2011-06-30 18:52:38,306 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.28.1.125:50010, 
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 65557 bytes

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

        at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,471 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: Block 
blk_-9220320572849103952_19298441 unfinalized and removed.


2011-06-30 18:52:38,471 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.142:53133, bytes: 0, op: HDFS_READ,
 cliID: D

FSClient_attempt_201106290946_0608_m_000004_1, offset: 0, srvID: 
DS-2025332107-172.28.1.125-50010-1300893119361, blockid: 
blk_6335845677046126634_44068668, duration: 296000

2011-06-30 18:52:38,472 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
blk_-9220320572849103952_19298441 received exception 
java.io.EOFException: while trying

to read 25370 bytes

2011-06-30 18:52:38,472 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.28.1.125:50010, 
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 25370 bytes

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

        at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,472 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_-9080487810231223837_12934481 src: /172.28.1.128:48816 dest: 
/172.28.1.125:5001

0 of size 934228

2011-06-30 18:52:38,472 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_-9047841941671274596_33117184 src: /172.28.1.139:57972 dest: 
/172.28.1.125:5001

0 of size 436507

2011-06-30 18:52:38,473 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: Block 
blk_-9189210190114502992_32988083 unfinalized and removed.


2011-06-30 18:52:38,473 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
blk_-9189210190114502992_32988083 received exception 
java.io.EOFException: while trying

to read 2730 bytes

2011-06-30 18:52:38,473 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.28.1.125:50010, 
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 2730 bytes

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

        at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,473 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_-9167383395209195324_43113407 src: /172.28.1.129:42479 dest: 
/172.28.1.125:5001

0 of size 306521

2011-06-30 18:52:38,474 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: Block 
blk_-9222760032390209760_33640525 unfinalized and removed.


2011-06-30 18:52:38,474 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
blk_-9222760032390209760_33640525 received exception 
java.io.EOFException: while trying

to read 5034 bytes

2011-06-30 18:52:38,474 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.28.1.125:50010, 
storageID=DS-2025332107-172.28.1.125-50010-1300893119361

, infoPort=50075, ipcPort=50020):DataXceiver

java.io.EOFException: while trying to read 5034 bytes

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:355)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:528)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

        at java.lang.Thread.run(Thread.java:619)

2011-06-30 18:52:38,475 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_-4835040186380340610_22556835 src: /172.28.1.129:42385 dest: 
/172.28.1.125:5001

0 of size 1873718

2011-06-30 18:52:38,500 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.103:48797, bytes: 0, op: HDFS_READ,
 cliID: D

FSClient_attempt_201106220308_2593_r_000103_0, offset: 0, srvID: 
DS-2025332107-172.28.1.125-50010-1300893119361, blockid: 
blk_-7869815404835809891_44068763, duration: 26414000

2011-06-30 18:52:38,543 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_-8830898802917406976_9670906 src: /172.28.1.135:38108 dest: 
/172.28.1.125:50010

of size 1586179

2011-06-30 18:52:38,545 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.108:49754, bytes: 0, op: HDFS_READ,
 cliID: D

FSClient_attempt_201106290946_0608_m_000117_1, offset: 0, srvID: 
DS-2025332107-172.28.1.125-50010-1300893119361, blockid: 
blk_8222968190463524631_44068718, duration: 71801000

2011-06-30 18:52:38,557 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.118:34031, bytes: 0, op: HDFS_READ,
 cliID: D

FSClient_attempt_201106290946_0605_m_005176_0, offset: 0, srvID: 
DS-2025332107-172.28.1.125-50010-1300893119361, blockid: 
blk_1403998834571958702_41330639, duration: 85592000

2011-06-30 18:52:38,587 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_-3977598016817417325_33983630 src: /172.28.1.107:59369 dest: 
/172.28.1.125:5001

0 of size 5787499

2011-06-30 18:52:38,614 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.138:47611, bytes: 0, op: HDFS_READ,
 cliID: D

FSClient_attempt_201106290946_0605_m_004132_0, offset: 0, srvID: 
DS-2025332107-172.28.1.125-50010-1300893119361, blockid: 
blk_-2960100665116847711_42689744, duration: 141415000

2011-06-30 18:52:38,644 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
/172.28.1.125:50010, dest: /172.28.1.112:55765, bytes: 0, op: HDFS_READ,
 cliID: D

FSClient_attempt_201106290946_0608_m_000303_1, offset: 0, srvID: 
DS-2025332107-172.28.1.125-50010-1300893119361, blockid: 
blk_8063669434333395104_44065929, duration: 1272000

2011-06-30 18:52:38,655 INFO 
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
succeeded for blk_3068907439054059419_27968064

2011-06-30 18:52:38,914 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_-8758412301877668079_31392851 src: /172.28.1.139:60920 dest: 
/172.28.1.125:5001

0 of size 3166291

:

 		 	   		  

Re: Dead data nodes during job excution and failed tasks.

Posted by Allen Wittenauer <aw...@apache.org>.
On Jun 30, 2011, at 12:36 PM, David Ginzburg wrote:

> 
> Is it possible though the server runs with vm.swappiness =5

	That only controls how aggressive the system swaps.  If you eat all the RAM in user space, the system is going to start paging memory regardless of swappiness.



RE: Dead data nodes during job excution and failed tasks.

Posted by David Ginzburg <gi...@hotmail.com>.
Is it possible though the server runs with vm.swappiness =5



> Subject: Re: Dead data nodes during job excution and failed tasks.
> From: aw@apache.org
> Date: Thu, 30 Jun 2011 11:46:25 -0700
> To: mapreduce-user@hadoop.apache.org
> 
> 
> On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote:
> 
> > 
> > Hi,
> > I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).
> 
> 	Check your memory usage during the job run.  Chances are good the DataNode is getting swapped out.
 		 	   		  

Re: Dead data nodes during job excution and failed tasks.

Posted by Allen Wittenauer <aw...@apache.org>.
On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote:

> 
> Hi,
> I am running a certain job which constantly cause dead data nodes (who come back later, spontaneously ).

	Check your memory usage during the job run.  Chances are good the DataNode is getting swapped out.