You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by rakesh kothari <rk...@hotmail.com> on 2010/10/12 21:53:11 UTC
Failures in the reducers
Hi,
My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB.
This job is failing consistently in the reduce phase with the following exception (below). Any ideas how to troubleshoot this ?
Thanks,
-Rakesh
Datanode logs:
INFO
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
left of total size: 408736960 bytes
2010-10-12
07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010
2010-10-12
07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-961587459095414398_368580
2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010
2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7795697604292519140_368580
2010-10-12
07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7687883740524807660_368625
2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-5546440551650461919_368626
2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-3894897742813130478_368628
2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
blk_8687736970664350304_368652 bad datanode[0] nodes == null
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
locations. Source file
"/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
- Aborting...
2010-10-12
07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
2010-10-12
07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task
Namenode is throwing following exception:
2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:500102010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:500102010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:500102010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_3686112010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_3212442010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
…
2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)
Re: Failures in the reducers
Posted by David Rosenstrauch <da...@darose.net>.
We ran into this recently. Solution was to bump up the value of the
dfs.datanode.max.xcievers setting.
HTH,
DR
On 10/12/2010 03:53 PM, rakesh kothari wrote:
>
> Hi,
>
> My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB.
>
> This job is failing consistently in the reduce phase with the following exception (below). Any ideas how to troubleshoot this ?
>
> Thanks,
> -Rakesh
>
> Datanode logs:
>
>
>
> INFO
> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
> left of total size: 408736960 bytes
>
> 2010-10-12
> 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
> 10.185.13.61:50010
>
> 2010-10-12
> 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-961587459095414398_368580
>
> 2010-10-12
> 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
> 10.185.13.61:50010
>
> 2010-10-12
> 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-7795697604292519140_368580
>
> 2010-10-12
> 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12
> 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-7687883740524807660_368625
>
> 2010-10-12
> 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12
> 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-5546440551650461919_368626
>
> 2010-10-12
> 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12
> 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-3894897742813130478_368628
>
> 2010-10-12
> 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12
> 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652
>
> 2010-10-12
> 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
> java.io.IOException: Unable to create new block.
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 2010-10-12
> 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
> blk_8687736970664350304_368652 bad datanode[0] nodes == null
>
> 2010-10-12
> 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
> locations. Source file
> "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
> - Aborting...
>
> 2010-10-12
> 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
>
> java.io.EOFException
>
>
> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>
> at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>
> at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>
>
> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-10-12
> 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
> task
>
>
>
> Namenode is throwing following exception:
>
> 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:500102010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313) at org.apache.hadoop.hdfs.
server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:500102010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:500102010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_3686112010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.
DataBlockScanner: Verification succeeded for blk_5680087852988027619_3212442010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
>
> …
>
> 2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thre
ad.run(Thread.java:619)2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)
>
>
>
RE: Failures in the reducers
Posted by rakesh kothari <rk...@hotmail.com>.
No. It just runs this job. It's 7 node cluster with 3 mapper and 2 reducer slot per node.
Date: Tue, 12 Oct 2010 13:23:23 -0700
Subject: Re: Failures in the reducers
From: shrijeet@rocketfuel.com
To: mapreduce-user@hadoop.apache.org
Is your cluster busy doing other things? (while this job is running)
On Tue, Oct 12, 2010 at 1:15 PM, rakesh kothari <rk...@hotmail.com> wrote:
Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.
Also, I don't get this error when I run my job on just 1 file (450 MB).
I wonder why this happen in the reduce stage since I just have 10 reducers and I don't see how those 256 connections are being opened.
-Rakesh
Date: Tue, 12 Oct 2010 13:02:16 -0700
Subject: Re: Failures in the reducers
From: shrijeet@rocketfuel.com
To: mapreduce-user@hadoop.apache.org
Rakesh, That error log looks like it belonged to DataNode and not NameNode. Anyways try pumping the parameter named dfs.datanode.max.xcievers up (shoot for 512). This param belongs to core-site.xml .
-Shrijeet
On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <rk...@hotmail.com> wrote:
Hi,
My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB.
This job is failing consistently in the reduce phase with the following exception (below). Any ideas how to troubleshoot this ?
Thanks,
-Rakesh
Datanode logs:
INFO
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
left of total size: 408736960 bytes
2010-10-12
07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010
2010-10-12
07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-961587459095414398_368580
2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010
2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7795697604292519140_368580
2010-10-12
07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7687883740524807660_368625
2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-5546440551650461919_368626
2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-3894897742813130478_368628
2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
blk_8687736970664350304_368652 bad datanode[0] nodes == null
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
locations. Source file
"/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
- Aborting...
2010-10-12
07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
2010-10-12
07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task
Namenode is throwing following exception:
2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010
2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010
2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010
2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_368611
2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_321244
2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
…
2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)
2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)
2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)
Re: Failures in the reducers
Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Is your cluster busy doing other things? (while this job is running)
On Tue, Oct 12, 2010 at 1:15 PM, rakesh kothari <rk...@hotmail.com>wrote:
> Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.
>
> Also, I don't get this error when I run my job on just 1 file (450 MB).
>
> I wonder why this happen in the reduce stage since I just have 10 reducers
> and I don't see how those 256 connections are being opened.
>
> -Rakesh
>
> ------------------------------
> Date: Tue, 12 Oct 2010 13:02:16 -0700
> Subject: Re: Failures in the reducers
> From: shrijeet@rocketfuel.com
> To: mapreduce-user@hadoop.apache.org
>
>
> Rakesh,
> That error log looks like it belonged to DataNode and not NameNode. Anyways
> try pumping the parameter named *dfs.datanode.max.xcievers* up (shoot for
> 512). This param belongs to core-site.xml .
>
> -Shrijeet
>
> On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <rkothari_iit@hotmail.com
> > wrote:
>
> Hi,
>
> My MR Job is processing gzipped files each around 450 MB and there are 24
> of them. File block size is 512 MB.
>
> This job is failing consistently in the reduce phase with the following
> exception (below). Any ideas how to troubleshoot this ?
>
> Thanks,
> -Rakesh
>
> Datanode logs:
>
> INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
> segments left of total size: 408736960 bytes
>
> 2010-10-12 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-961587459095414398_368580
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7795697604292519140_368580
>
> 2010-10-12 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7687883740524807660_368625
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-5546440551650461919_368626
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-3894897742813130478_368628
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_8687736970664350304_368652
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new block.
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8687736970664350304_368652 bad datanode[0] nodes ==
> null
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
> "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
> - Aborting...
>
> 2010-10-12 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
>
> java.io.EOFException
>
> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
> at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
> at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
> at org.apache.hadoop.io.Text.readString(Text.java:400)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-10-12 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
> cleanup for the task
>
>
> Namenode is throwing following exception:
>
> 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException
>
> 2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
> at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
> at org.apache.hadoop.io.Text.readString(Text.java:400)
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>
> at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_368611
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating
>
> 2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_321244
>
> 2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
>
> …
>
> 2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
> at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_321238
>
> 2010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
> at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
> at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>
>
RE: Failures in the reducers
Posted by rakesh kothari <rk...@hotmail.com>.
Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.
Also, I don't get this error when I run my job on just 1 file (450 MB).
I wonder why this happen in the reduce stage since I just have 10 reducers and I don't see how those 256 connections are being opened.
-Rakesh
Date: Tue, 12 Oct 2010 13:02:16 -0700
Subject: Re: Failures in the reducers
From: shrijeet@rocketfuel.com
To: mapreduce-user@hadoop.apache.org
Rakesh, That error log looks like it belonged to DataNode and not NameNode. Anyways try pumping the parameter named dfs.datanode.max.xcievers up (shoot for 512). This param belongs to core-site.xml .
-Shrijeet
On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <rk...@hotmail.com> wrote:
Hi,
My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB.
This job is failing consistently in the reduce phase with the following exception (below). Any ideas how to troubleshoot this ?
Thanks,
-Rakesh
Datanode logs:
INFO
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
left of total size: 408736960 bytes
2010-10-12
07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010
2010-10-12
07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-961587459095414398_368580
2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010
2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7795697604292519140_368580
2010-10-12
07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7687883740524807660_368625
2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-5546440551650461919_368626
2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-3894897742813130478_368628
2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
blk_8687736970664350304_368652 bad datanode[0] nodes == null
2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
locations. Source file
"/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
- Aborting...
2010-10-12
07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
2010-10-12
07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task
Namenode is throwing following exception:
2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010
2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010
2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010
2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_368611
2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_321244
2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
…
2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)
2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)
2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619)
Re: Failures in the reducers
Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Rakesh,
That error log looks like it belonged to DataNode and not NameNode. Anyways
try pumping the parameter named *dfs.datanode.max.xcievers* up (shoot for
512). This param belongs to core-site.xml .
-Shrijeet
On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari
<rk...@hotmail.com>wrote:
> Hi,
>
> My MR Job is processing gzipped files each around 450 MB and there are 24
> of them. File block size is 512 MB.
>
> This job is failing consistently in the reduce phase with the following
> exception (below). Any ideas how to troubleshoot this ?
>
> Thanks,
> -Rakesh
>
> Datanode logs:
>
> INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
> segments left of total size: 408736960 bytes
>
> 2010-10-12 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-961587459095414398_368580
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7795697604292519140_368580
>
> 2010-10-12 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7687883740524807660_368625
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-5546440551650461919_368626
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-3894897742813130478_368628
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_8687736970664350304_368652
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new block.
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8687736970664350304_368652 bad datanode[0] nodes ==
> null
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
> "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
> - Aborting...
>
> 2010-10-12 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
>
> java.io.EOFException
>
> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
> at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
> at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
> at org.apache.hadoop.io.Text.readString(Text.java:400)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-10-12 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
> cleanup for the task
>
>
> Namenode is throwing following exception:
>
> 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException
>
> 2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
> at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
> at org.apache.hadoop.io.Text.readString(Text.java:400)
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>
> at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_368611
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating
>
> 2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_321244
>
> 2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
>
> …
>
> 2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
> at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_321238
>
> 2010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
> at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
> at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>