You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Jason Yang <li...@gmail.com> on 2012/09/20 16:28:48 UTC

Job failed with large volume of small data: java.io.EOFException

Hi, all

I have encounter a weird problem, I got a MR job which would always failed
if there are large number of input file(e.g. 400 input files), but
always succeed if there is only a little input files(e.g. 20 input files).

In this job , the map phase would read all the input files and interpret
each of them as a set of record, the intermediate output of mapper is
<record.type, record>, and reducer just write record with same type to same
file by using a MultipleSequenceFileOutputFormat.

according to the running status attached below, I have found that all the
reducer has been failed, and the error is EOFException, which make me more
confused.

Is there any suggestion to fix this?

-----
Hadoop job_201209191629_0013 on node10<http://localhost:50030/jobtracker.jsp>
*User:* root
*Job Name:* localClustering.jar
*Job File:*
hdfs://node10:9000/mnt/md5/mapred/system/job_201209191629_0013/job.xml<http://localhost:50030/jobconf.jsp?jobid=job_201209191629_0013>
*Job Setup:* Successful<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=setup&pagenum=1&state=completed>
*Status:* Failed
*Started at:* Thu Sep 20 21:55:11 CST 2012
*Failed at:* Thu Sep 20 22:03:51 CST 2012
*Failed in:* 8mins, 40sec
*Job Cleanup:* Successful<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=cleanup&pagenum=1&state=completed>
------------------------------
Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed
Task Attempts<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013>
map<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=map&pagenum=1>
100.00%400 00400<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=map&pagenum=1&state=completed>
00 / 8<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=map&cause=killed>
reduce<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=reduce&pagenum=1>
100.00% 70007<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=reduce&pagenum=1&state=killed>
19<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=reduce&cause=failed>
 / 7<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=reduce&cause=killed>


CounterMapReduce TotalJob CountersLaunched reduce tasks0026Rack-local map
tasks0 01Launched map tasks00408Data-local map tasks0 0 407Failed reduce
tasks001FileSystemCountersHDFS_BYTES_READ 899,202,3420899,202,342
FILE_BYTES_WRITTEN742,195,9520742,195,952HDFS_BYTES_WRITTEN 1,038,9600
1,038,960Map-Reduce FrameworkCombine output records00 0Map input records4000
400Spilled Records992,1240 992,124Map output bytes738,140,2560738,140,256Map
input bytes567,520,400 0567,520,400Map output records992,1240992,124Combine
input records0 00

task_201209191629_0013_r_000000
<http://localhost:50030/taskdetails.jsp?jobid=job_201209191629_0013&tipid=task_201209191629_0013_r_000000>node6
<http://node6:50060/>
FAILED

java.io.EOFException
	at java.io.DataInputStream.readByte(DataInputStream.java:250)
	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
	at org.apache.hadoop.io.Text.readString(Text.java:400)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)



*syslog logs*

ReduceTask: Read 267038 bytes from map-output for
attempt_201209191629_0013_m_000392_0
2012-09-21 05:57:25,249 INFO org.apache.hadoop.mapred.ReduceTask: Rec
#1 from attempt_201209191629_0013_m_000392_0 -> (15, 729) from node6
2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask:
GetMapEventsThread exiting
2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask:
getMapsEventsThread joined.
2012-09-21 05:57:25,436 INFO org.apache.hadoop.mapred.ReduceTask:
Closed ram manager
2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 1 files left.
2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 72 files left.
2012-09-21 05:57:25,446 INFO org.apache.hadoop.mapred.Merger: Merging
72 sorted segments
2012-09-21 05:57:25,447 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 42 segments left of total size: 22125176
bytes
2012-09-21 05:57:25,755 INFO org.apache.hadoop.mapred.ReduceTask:
Merged 72 segments, 22125236 bytes to disk to satisfy reduce memory
limit
2012-09-21 05:57:25,757 INFO org.apache.hadoop.mapred.ReduceTask:
Merging 2 files, 108299192 bytes from disk
2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.ReduceTask:
Merging 0 segments, 0 bytes from memory into reduce
2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.Merger: Merging
2 sorted segments
2012-09-21 05:57:25,764 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 2 segments left of total size: 108299184
bytes
2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient:
Exception in createBlockOutputStream java.io.EOFException
2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_-2683295125469062550_13791
2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient:
Exception in createBlockOutputStream java.io.EOFException
2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_2048430611271251978_13803
2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient:
Exception in createBlockOutputStream java.io.EOFException
2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_4739785392963375165_13815
2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient:
Exception in createBlockOutputStream java.io.EOFException
2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_-6981138506714889098_13819
2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception: java.io.IOException: Unable to create new
block.
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-6981138506714889098_13819 bad datanode[0]
nodes == null
2012-09-21 05:57:53,754 WARN org.apache.hadoop.hdfs.DFSClient: Could
not get block locations. Source file
"/work/icc/intermediateoutput/lc/RR/_temporary/_attempt_201209191629_0013_r_000000_0/RR-LC-022033-1"
- Aborting...
2012-09-21 05:57:54,539 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.EOFException
	at java.io.DataInputStream.readByte(DataInputStream.java:250)
	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
	at org.apache.hadoop.io.Text.readString(Text.java:400)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
2012-09-21 05:57:54,542 INFO org.apache.hadoop.mapred.TaskRunner:
Runnning cleanup for the task




-- 
YANG, Lin

Re: Job failed with large volume of small data: java.io.EOFException

Posted by Bejoy Ks <be...@gmail.com>.
Hi Jason

Are you seeing any errors in your data node logs. Specifically like '
xceivers count exceeded'. In that case you may need to bump up te value of
  dfs.datanode.max.xcievers  to ahigher value.

If not, it is possible that you are crossing the upper limit of open files
on your linux boxes that run DNs. You can verify the current value using
'ulimit -n' and then try increasing the same to a much higher value.

Regards
Bejoy KS

Re: Job failed with large volume of small data: java.io.EOFException

Posted by Bejoy Ks <be...@gmail.com>.
Hi Jason

Are you seeing any errors in your data node logs. Specifically like '
xceivers count exceeded'. In that case you may need to bump up te value of
  dfs.datanode.max.xcievers  to ahigher value.

If not, it is possible that you are crossing the upper limit of open files
on your linux boxes that run DNs. You can verify the current value using
'ulimit -n' and then try increasing the same to a much higher value.

Regards
Bejoy KS

Re: Job failed with large volume of small data: java.io.EOFException

Posted by Bejoy Ks <be...@gmail.com>.
Hi Jason

Are you seeing any errors in your data node logs. Specifically like '
xceivers count exceeded'. In that case you may need to bump up te value of
  dfs.datanode.max.xcievers  to ahigher value.

If not, it is possible that you are crossing the upper limit of open files
on your linux boxes that run DNs. You can verify the current value using
'ulimit -n' and then try increasing the same to a much higher value.

Regards
Bejoy KS

Re: Job failed with large volume of small data: java.io.EOFException

Posted by Bejoy Ks <be...@gmail.com>.
Hi Jason

Are you seeing any errors in your data node logs. Specifically like '
xceivers count exceeded'. In that case you may need to bump up te value of
  dfs.datanode.max.xcievers  to ahigher value.

If not, it is possible that you are crossing the upper limit of open files
on your linux boxes that run DNs. You can verify the current value using
'ulimit -n' and then try increasing the same to a much higher value.

Regards
Bejoy KS