You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by rakesh kothari <rk...@hotmail.com> on 2010/10/12 21:53:11 UTC

Failures in the reducers

Hi,

My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB. 

This job is failing consistently in the reduce phase with the following exception (below). Any ideas how to troubleshoot this ?

Thanks,
-Rakesh

Datanode logs:



INFO
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
left of total size: 408736960 bytes

2010-10-12
07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010

2010-10-12
07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-961587459095414398_368580

2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010

2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7795697604292519140_368580

2010-10-12
07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7687883740524807660_368625

2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-5546440551650461919_368626

2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-3894897742813130478_368628

2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

 

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
blk_8687736970664350304_368652 bad datanode[0] nodes == null

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
locations. Source file
"/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
- Aborting...

2010-10-12
07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

java.io.EOFException

       
at java.io.DataInputStream.readByte(DataInputStream.java:250)

       
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)

       
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)

       
at org.apache.hadoop.io.Text.readString(Text.java:400)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)

       
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

2010-10-12
07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task



Namenode is throwing following exception:

2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:500102010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.EOFException        at java.io.DataInputStream.readByte(DataInputStream.java:250)        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)        at org.apache.hadoop.io.Text.readString(Text.java:400)        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)        at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:500102010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:500102010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_3686112010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_3212442010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290

…

2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)


 		 	   		  

Re: Failures in the reducers

Posted by David Rosenstrauch <da...@darose.net>.
We ran into this recently.  Solution was to bump up the value of the 
dfs.datanode.max.xcievers setting.

HTH,

DR

On 10/12/2010 03:53 PM, rakesh kothari wrote:
>
> Hi,
>
> My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB.
>
> This job is failing consistently in the reduce phase with the following exception (below). Any ideas how to troubleshoot this ?
>
> Thanks,
> -Rakesh
>
> Datanode logs:
>
>
>
> INFO
> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
> left of total size: 408736960 bytes
>
> 2010-10-12
> 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
> 10.185.13.61:50010
>
> 2010-10-12
> 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-961587459095414398_368580
>
> 2010-10-12
> 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
> 10.185.13.61:50010
>
> 2010-10-12
> 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-7795697604292519140_368580
>
> 2010-10-12
> 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12
> 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-7687883740524807660_368625
>
> 2010-10-12
> 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12
> 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-5546440551650461919_368626
>
> 2010-10-12
> 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12
> 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
> blk_-3894897742813130478_368628
>
> 2010-10-12
> 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12
> 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652
>
> 2010-10-12
> 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
> java.io.IOException: Unable to create new block.
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 2010-10-12
> 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
> blk_8687736970664350304_368652 bad datanode[0] nodes == null
>
> 2010-10-12
> 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
> locations. Source file
> "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
> - Aborting...
>
> 2010-10-12
> 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
>
> java.io.EOFException
>
>
> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>
> at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>
> at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>
>
> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-10-12
> 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
> task
>
>
>
> Namenode is throwing following exception:
>
> 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:500102010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.EOFException        at java.io.DataInputStream.readByte(DataInputStream.java:250)        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)        at org.apache.hadoop.io.Text.readString(Text.java:400)        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)        at org.apache.hadoop.hdfs.
server.datanode.DataXceiver.run(DataXceiver.java:103)        at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:500102010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:500102010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_3686112010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.
DataBlockScanner: Verification succeeded for blk_5680087852988027619_3212442010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
>
> …
>
> 2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thre
ad.run(Thread.java:619)2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiverjava.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)
>
>
>   		 	   		


RE: Failures in the reducers

Posted by rakesh kothari <rk...@hotmail.com>.
No. It just runs this job. It's 7 node cluster with 3 mapper and 2 reducer slot per node.

Date: Tue, 12 Oct 2010 13:23:23 -0700
Subject: Re: Failures in the reducers
From: shrijeet@rocketfuel.com
To: mapreduce-user@hadoop.apache.org

Is your cluster busy doing other things? (while this job is running) 

On Tue, Oct 12, 2010 at 1:15 PM, rakesh kothari <rk...@hotmail.com> wrote:






Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.

Also, I don't get this error when I run my job on just 1 file (450 MB).

I  wonder why this happen in the reduce stage since I just have 10 reducers and I don't see how those 256 connections are being opened.


-Rakesh

Date: Tue, 12 Oct 2010 13:02:16 -0700
Subject: Re: Failures in the reducers
From: shrijeet@rocketfuel.com
To: mapreduce-user@hadoop.apache.org


Rakesh, That error log looks like it belonged to DataNode and not NameNode. Anyways try pumping the parameter named dfs.datanode.max.xcievers up (shoot for 512). This param belongs to core-site.xml . 


-Shrijeet

On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <rk...@hotmail.com> wrote:







Hi,

My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB. 

This job is failing consistently in the reduce phase with the following exception (below). Any ideas how to troubleshoot this ?



Thanks,
-Rakesh

Datanode logs:



INFO
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
left of total size: 408736960 bytes

2010-10-12
07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010

2010-10-12
07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-961587459095414398_368580

2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010

2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7795697604292519140_368580

2010-10-12
07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7687883740524807660_368625

2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-5546440551650461919_368626

2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-3894897742813130478_368628

2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

 

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
blk_8687736970664350304_368652 bad datanode[0] nodes == null

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
locations. Source file
"/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
- Aborting...

2010-10-12
07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

java.io.EOFException

       
at java.io.DataInputStream.readByte(DataInputStream.java:250)

       
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)

       
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)

       
at org.apache.hadoop.io.Text.readString(Text.java:400)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)

       
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

2010-10-12
07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task



Namenode is throwing following exception:

2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010

2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver

java.io.EOFException        at java.io.DataInputStream.readByte(DataInputStream.java:250)        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)

        at org.apache.hadoop.io.Text.readString(Text.java:400)        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

        at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010

2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010

2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_368611

2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_321244

2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290

…

2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver

java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)

2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver

java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)

2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver

java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)




 		 	   		  

 		 	   		  

 		 	   		  

Re: Failures in the reducers

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Is your cluster busy doing other things? (while this job is running)

On Tue, Oct 12, 2010 at 1:15 PM, rakesh kothari <rk...@hotmail.com>wrote:

>  Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.
>
> Also, I don't get this error when I run my job on just 1 file (450 MB).
>
> I  wonder why this happen in the reduce stage since I just have 10 reducers
> and I don't see how those 256 connections are being opened.
>
> -Rakesh
>
> ------------------------------
> Date: Tue, 12 Oct 2010 13:02:16 -0700
> Subject: Re: Failures in the reducers
> From: shrijeet@rocketfuel.com
> To: mapreduce-user@hadoop.apache.org
>
>
> Rakesh,
> That error log looks like it belonged to DataNode and not NameNode. Anyways
> try pumping the parameter named *dfs.datanode.max.xcievers* up (shoot for
> 512). This param belongs to core-site.xml .
>
> -Shrijeet
>
> On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <rkothari_iit@hotmail.com
> > wrote:
>
>  Hi,
>
> My MR Job is processing gzipped files each around 450 MB and there are 24
> of them. File block size is 512 MB.
>
> This job is failing consistently in the reduce phase with the following
> exception (below). Any ideas how to troubleshoot this ?
>
> Thanks,
> -Rakesh
>
> Datanode logs:
>
> INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
> segments left of total size: 408736960 bytes
>
> 2010-10-12 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-961587459095414398_368580
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7795697604292519140_368580
>
> 2010-10-12 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7687883740524807660_368625
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-5546440551650461919_368626
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-3894897742813130478_368628
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_8687736970664350304_368652
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new block.
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8687736970664350304_368652 bad datanode[0] nodes ==
> null
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
> "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
> - Aborting...
>
> 2010-10-12 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>         at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>         at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-10-12 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
> cleanup for the task
>
>
> Namenode is throwing following exception:
>
> 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException
>
> 2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>         at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_368611
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating
>
> 2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_321244
>
> 2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
>
> …
>
> 2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_321238
>
> 2010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>
>

RE: Failures in the reducers

Posted by rakesh kothari <rk...@hotmail.com>.
Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.

Also, I don't get this error when I run my job on just 1 file (450 MB).

I  wonder why this happen in the reduce stage since I just have 10 reducers and I don't see how those 256 connections are being opened.

-Rakesh

Date: Tue, 12 Oct 2010 13:02:16 -0700
Subject: Re: Failures in the reducers
From: shrijeet@rocketfuel.com
To: mapreduce-user@hadoop.apache.org

Rakesh, That error log looks like it belonged to DataNode and not NameNode. Anyways try pumping the parameter named dfs.datanode.max.xcievers up (shoot for 512). This param belongs to core-site.xml . 

-Shrijeet

On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <rk...@hotmail.com> wrote:






Hi,

My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB. 

This job is failing consistently in the reduce phase with the following exception (below). Any ideas how to troubleshoot this ?


Thanks,
-Rakesh

Datanode logs:



INFO
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
left of total size: 408736960 bytes

2010-10-12
07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010

2010-10-12
07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-961587459095414398_368580

2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010

2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7795697604292519140_368580

2010-10-12
07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7687883740524807660_368625

2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-5546440551650461919_368626

2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-3894897742813130478_368628

2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

 

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
blk_8687736970664350304_368652 bad datanode[0] nodes == null

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
locations. Source file
"/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
- Aborting...

2010-10-12
07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

java.io.EOFException

       
at java.io.DataInputStream.readByte(DataInputStream.java:250)

       
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)

       
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)

       
at org.apache.hadoop.io.Text.readString(Text.java:400)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)

       
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

2010-10-12
07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task



Namenode is throwing following exception:

2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010
2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.EOFException        at java.io.DataInputStream.readByte(DataInputStream.java:250)        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
        at org.apache.hadoop.io.Text.readString(Text.java:400)        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
        at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010
2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010
2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_368611
2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_321244
2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290

…

2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)
2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)
2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)



 		 	   		  

 		 	   		  

Re: Failures in the reducers

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Rakesh,
That error log looks like it belonged to DataNode and not NameNode. Anyways
try pumping the parameter named *dfs.datanode.max.xcievers* up (shoot for
512). This param belongs to core-site.xml .

-Shrijeet

On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari
<rk...@hotmail.com>wrote:

>  Hi,
>
> My MR Job is processing gzipped files each around 450 MB and there are 24
> of them. File block size is 512 MB.
>
> This job is failing consistently in the reduce phase with the following
> exception (below). Any ideas how to troubleshoot this ?
>
> Thanks,
> -Rakesh
>
> Datanode logs:
>
> INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
> segments left of total size: 408736960 bytes
>
> 2010-10-12 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-961587459095414398_368580
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7795697604292519140_368580
>
> 2010-10-12 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7687883740524807660_368625
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-5546440551650461919_368626
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-3894897742813130478_368628
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_8687736970664350304_368652
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new block.
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8687736970664350304_368652 bad datanode[0] nodes ==
> null
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
> "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
> - Aborting...
>
> 2010-10-12 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>         at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>         at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-10-12 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
> cleanup for the task
>
>
> Namenode is throwing following exception:
>
> 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657 received exception java.io.EOFException
>
> 2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>         at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162, blockid: blk_9216465415312085861_368611
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_9216465415312085861_368611 terminating
>
> 2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5680087852988027619_321244
>
> 2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1637914415591966611_321290
>
> …
>
> 2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5731266331675183628_321238
>
> 2010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>