You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Hayden Marchant <ha...@amobee.com> on 2014/11/05 10:55:32 UTC

Failure to write to HDFS in MapReduce job

I have a MapReduce job running on Hadoop 2.0.0, and on some 'heavy' jobs, I am seeing the following errors in the reducer. 


2014-11-04 13:30:57,761 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186 java.io.EOFException: Premature EOF: no length prefix available 	at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) 	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114) 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695) 2014-11-04 13:30:57,842 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186 in pipeline 172.30.120.143:50010, 172.30.120.186:50010: bad datanode 172.30.120.143:50010 2014-11-04 13:33:09,707 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439488 java.io.EOFException: Premature EOF: no length prefix available 	at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) 	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114) 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695) 

Once this error occurs, every time the code subsequently tries to write to HDFS, it gets a different error:

java.io.IOException: All datanodes 172.30.120.193:50010 are bad. Aborting...
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:960)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

This error happens for EVERY write.

btw, 172.30.21.49 is our NameNode, and 172.30.120.193 is the slave on which this task was running.

What should I be looking at to stop this happening? Could it be a resource contention happening somewhere? I looked at Namenode console and we have enough disk-space. 

Clearly, I want to avoid this happening, and would also like recommendation of what to do if this does happen - currently, the exception is caught and a counter is incremented. Maybe we should be throwing this exception up so that the task is retried somewhere else.

Any recommendations/advice are welcome.

Thanks,
Hayden

Re: Failure to write to HDFS in MapReduce job

Posted by Azuryy Yu <az...@gmail.com>.
please check your network issue. generally this was casued by unstable
network device.

On Wed, Nov 5, 2014 at 5:55 PM, Hayden Marchant <ha...@amobee.com> wrote:

>
> I have a MapReduce job running on Hadoop 2.0.0, and on some 'heavy' jobs,
> I am seeing the following errors in the reducer.
>
>
> 2014-11-04 13:30:57,761 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186
> java.io.EOFException: Premature EOF: no length prefix available  at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
>  at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>   at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
> 2014-11-04 13:30:57,842 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186 in
> pipeline 172.30.120.143:50010, 172.30.120.186:50010: bad datanode
> 172.30.120.143:50010 2014-11-04 13:33:09,707 WARN
> org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439488
> java.io.EOFException: Premature EOF: no length prefix available        at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
>  at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>   at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
>
> Once this error occurs, every time the code subsequently tries to write to
> HDFS, it gets a different error:
>
> java.io.IOException: All datanodes 172.30.120.193:50010 are bad.
> Aborting...
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:960)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>
> This error happens for EVERY write.
>
> btw, 172.30.21.49 is our NameNode, and 172.30.120.193 is the slave on
> which this task was running.
>
> What should I be looking at to stop this happening? Could it be a resource
> contention happening somewhere? I looked at Namenode console and we have
> enough disk-space.
>
> Clearly, I want to avoid this happening, and would also like
> recommendation of what to do if this does happen - currently, the exception
> is caught and a counter is incremented. Maybe we should be throwing this
> exception up so that the task is retried somewhere else.
>
> Any recommendations/advice are welcome.
>
> Thanks,
> Hayden

Re: Failure to write to HDFS in MapReduce job

Posted by Azuryy Yu <az...@gmail.com>.
please check your network issue. generally this was casued by unstable
network device.

On Wed, Nov 5, 2014 at 5:55 PM, Hayden Marchant <ha...@amobee.com> wrote:

>
> I have a MapReduce job running on Hadoop 2.0.0, and on some 'heavy' jobs,
> I am seeing the following errors in the reducer.
>
>
> 2014-11-04 13:30:57,761 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186
> java.io.EOFException: Premature EOF: no length prefix available  at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
>  at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>   at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
> 2014-11-04 13:30:57,842 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186 in
> pipeline 172.30.120.143:50010, 172.30.120.186:50010: bad datanode
> 172.30.120.143:50010 2014-11-04 13:33:09,707 WARN
> org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439488
> java.io.EOFException: Premature EOF: no length prefix available        at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
>  at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>   at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
>
> Once this error occurs, every time the code subsequently tries to write to
> HDFS, it gets a different error:
>
> java.io.IOException: All datanodes 172.30.120.193:50010 are bad.
> Aborting...
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:960)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>
> This error happens for EVERY write.
>
> btw, 172.30.21.49 is our NameNode, and 172.30.120.193 is the slave on
> which this task was running.
>
> What should I be looking at to stop this happening? Could it be a resource
> contention happening somewhere? I looked at Namenode console and we have
> enough disk-space.
>
> Clearly, I want to avoid this happening, and would also like
> recommendation of what to do if this does happen - currently, the exception
> is caught and a counter is incremented. Maybe we should be throwing this
> exception up so that the task is retried somewhere else.
>
> Any recommendations/advice are welcome.
>
> Thanks,
> Hayden

Re: Failure to write to HDFS in MapReduce job

Posted by Azuryy Yu <az...@gmail.com>.
please check your network issue. generally this was casued by unstable
network device.

On Wed, Nov 5, 2014 at 5:55 PM, Hayden Marchant <ha...@amobee.com> wrote:

>
> I have a MapReduce job running on Hadoop 2.0.0, and on some 'heavy' jobs,
> I am seeing the following errors in the reducer.
>
>
> 2014-11-04 13:30:57,761 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186
> java.io.EOFException: Premature EOF: no length prefix available  at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
>  at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>   at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
> 2014-11-04 13:30:57,842 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186 in
> pipeline 172.30.120.143:50010, 172.30.120.186:50010: bad datanode
> 172.30.120.143:50010 2014-11-04 13:33:09,707 WARN
> org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439488
> java.io.EOFException: Premature EOF: no length prefix available        at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
>  at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>   at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
>
> Once this error occurs, every time the code subsequently tries to write to
> HDFS, it gets a different error:
>
> java.io.IOException: All datanodes 172.30.120.193:50010 are bad.
> Aborting...
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:960)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>
> This error happens for EVERY write.
>
> btw, 172.30.21.49 is our NameNode, and 172.30.120.193 is the slave on
> which this task was running.
>
> What should I be looking at to stop this happening? Could it be a resource
> contention happening somewhere? I looked at Namenode console and we have
> enough disk-space.
>
> Clearly, I want to avoid this happening, and would also like
> recommendation of what to do if this does happen - currently, the exception
> is caught and a counter is incremented. Maybe we should be throwing this
> exception up so that the task is retried somewhere else.
>
> Any recommendations/advice are welcome.
>
> Thanks,
> Hayden

Re: Failure to write to HDFS in MapReduce job

Posted by Azuryy Yu <az...@gmail.com>.
please check your network issue. generally this was casued by unstable
network device.

On Wed, Nov 5, 2014 at 5:55 PM, Hayden Marchant <ha...@amobee.com> wrote:

>
> I have a MapReduce job running on Hadoop 2.0.0, and on some 'heavy' jobs,
> I am seeing the following errors in the reducer.
>
>
> 2014-11-04 13:30:57,761 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186
> java.io.EOFException: Premature EOF: no length prefix available  at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
>  at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>   at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
> 2014-11-04 13:30:57,842 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439186 in
> pipeline 172.30.120.143:50010, 172.30.120.186:50010: bad datanode
> 172.30.120.143:50010 2014-11-04 13:33:09,707 WARN
> org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block
> BP-60005389-172.30.21.49-1379424439243:blk_-4575496846575688807_62439488
> java.io.EOFException: Premature EOF: no length prefix available        at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
>  at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>   at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:695)
>
> Once this error occurs, every time the code subsequently tries to write to
> HDFS, it gets a different error:
>
> java.io.IOException: All datanodes 172.30.120.193:50010 are bad.
> Aborting...
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:960)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>
> This error happens for EVERY write.
>
> btw, 172.30.21.49 is our NameNode, and 172.30.120.193 is the slave on
> which this task was running.
>
> What should I be looking at to stop this happening? Could it be a resource
> contention happening somewhere? I looked at Namenode console and we have
> enough disk-space.
>
> Clearly, I want to avoid this happening, and would also like
> recommendation of what to do if this does happen - currently, the exception
> is caught and a counter is incremented. Maybe we should be throwing this
> exception up so that the task is retried somewhere else.
>
> Any recommendations/advice are welcome.
>
> Thanks,
> Hayden