You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Keith Thompson <kt...@binghamton.edu> on 2012/05/02 23:49:56 UTC

Reduce Hangs at 66%

I am running a task which gets to 66% of the Reduce step and then hangs
indefinitely. Here is the log file (I apologize if I am putting too much
here but I am not exactly sure what is relevant):

2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip
task_201202240659_6433_r_000000, for tracker
'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201202240659_6433_m_000001_0' has completed
task_201202240659_6433_m_000001 successfully.
2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201202240659_6432_r_000000_0: Task
attempt_201202240659_6432_r_000000_0 failed to report status for 1800
seconds. Killing!
2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_000000_0'
2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to
tip task_201202240659_6432_r_000000, for tracker
'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_000000_0'
2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip
task_201202240659_6432_r_000000, for tracker
'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
The temporary job-output directory
hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
doesn't exist!
	at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
	at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
	at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_000000_1'
2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip
task_201202240659_6432_r_000000, for tracker
'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201202240659_6432_r_000000_2: java.io.IOException:
The temporary job-output directory
hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
doesn't exist!
	at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
	at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
	at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_000000_2'
2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip
task_201202240659_6432_r_000000, for tracker
'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201202240659_6432_r_000000_3: java.io.IOException:
The temporary job-output directory
hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
doesn't exist!
	at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
	at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
	at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress:
TaskInProgress task_201202240659_6432_r_000000 has failed 4 times.
2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
Aborting job job_201202240659_6432
2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
Killing job 'job_201202240659_6432'
2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to
tip task_201202240659_6432_m_000002, for tracker
'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_000000_3'
2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201202240659_6432_m_000002_0' has completed
task_201202240659_6432_m_000002 successfully.
2012-05-02 17:01:22,443 INFO
org.apache.hadoop.mapred.JobInProgress$JobSummary:
jobId=job_201202240659_6432,submitTime=1335990629637,launchTime=1335990629702,firstMapTaskLaunchTime=1335990631252,firstReduceTaskLaunchTime=1335990633988,firstJobSetupTaskLaunchTime=1335990629727,firstJobCleanupTaskLaunchTime=1335992481228,finishTime=1335992482443,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=thompson,queue=default,status=FAILED,mapSlotSeconds=7,reduceSlotsSeconds=32,clusterMapCapacity=138,clusterReduceCapacity=138
2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
on /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
File does not exist. Holder DFSClient_1931891231 does not have any
open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
	at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)

	at org.apache.hadoop.ipc.Client.call(Client.java:1104)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
	at $Proxy4.addBlock(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy4.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)

2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block null bad datanode[0] nodes == null
2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Could
not get block locations. Source file
"/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity"
- Aborting...
2012-05-02 17:01:22,446 INFO org.apache.hadoop.mapred.JobHistory:
Moving file:/var/log/hadoop/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
to file:/var/log/hadoop/history/done
2012-05-02 17:01:22,447 ERROR org.apache.hadoop.hdfs.DFSClient:
Exception closing file
/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
on /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
File does not exist. Holder DFSClient_1931891231 does not have any
open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
	at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)


Can anyone explain to me what is causing the problem?

-- 
*Keith Thompson*

Re: Reduce Hangs at 66%

Posted by Keith Thompson <kt...@binghamton.edu>.
Thanks everyone for your help. It is running fine now.


On Fri, May 4, 2012 at 11:22 AM, Michael Segel <mi...@hotmail.com>wrote:

> Well
> That was one of the things I had asked.
> ulimit -a says it all.
>
> But you have to do this for the users... hdfs, mapred, and hadoop
>
> (Which is why I asked about which flavor.)
>
> On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote:
>
> > Keith
> >
> > What is the the output for ulimit -n? Your value for number of open
> files is probably too low.
> >
> > Raj
> >
> >
> >
> >
> >> ________________________________
> >> From: Keith Thompson <kt...@binghamton.edu>
> >> To: common-user@hadoop.apache.org
> >> Sent: Thursday, May 3, 2012 4:33 PM
> >> Subject: Re: Reduce Hangs at 66%
> >>
> >> I am not sure about ulimits, but I can answer the rest. It's a Cloudera
> >> distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce
> step,
> >> I am taking keys in the form of (gridID, date), each with a value of 1.
> The
> >> reduce step just sums the 1's as the final output value for the key
> (It's
> >> counting how many people were in the gridID on a certain day).
> >>
> >> I have been running other more complicated jobs with no problem, so I'm
> not
> >> sure why this one is being peculiar. This is the code I used to execute
> the
> >> program from the command line (the source is a file on the hdfs):
> >>
> >> hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1
> >>
> >> The program then executes the map and gets to 66% of the reduce, then
> stops
> >> responding. The cause of the error seems to be:
> >>
> >> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
> >>> The temporary job-output directory
> >>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> >>> doesn't exist!
> >>
> >> I don't understand what the _temporary is. I am assuming it's something
> >> Hadoop creates automatically.
> >>
> >>
> >>
> >> On Thu, May 3, 2012 at 5:02 AM, Michel Segel <michael_segel@hotmail.com
> >wrote:
> >>
> >>> Well...
> >>> Lots of information but still missing some of the basics...
> >>>
> >>> Which release and version?
> >>> What are your ulimits set to?
> >>> How much free disk space do you have?
> >>> What are you attempting to do?
> >>>
> >>> Stuff like that.
> >>>
> >>>
> >>>
> >>> Sent from a remote device. Please excuse any typos...
> >>>
> >>> Mike Segel
> >>>
> >>> On May 2, 2012, at 4:49 PM, Keith Thompson <kt...@binghamton.edu>
> >>> wrote:
> >>>
> >>>> I am running a task which gets to 66% of the Reduce step and then
> hangs
> >>>> indefinitely. Here is the log file (I apologize if I am putting too
> much
> >>>> here but I am not exactly sure what is relevant):
> >>>>
> >>>> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip
> >>>> task_201202240659_6433_r_000000, for tracker
> >>>> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
> >>>> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
> >>>> Task 'attempt_201202240659_6433_m_000001_0' has completed
> >>>> task_201202240659_6433_m_000001 successfully.
> >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
> >>>> Error from attempt_201202240659_6432_r_000000_0: Task
> >>>> attempt_201202240659_6432_r_000000_0 failed to report status for 1800
> >>>> seconds. Killing!
> >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Removing task 'attempt_201202240659_6432_r_000000_0'
> >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to
> >>>> tip task_201202240659_6432_r_000000, for tracker
> >>>> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
> >>>> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Removing task 'attempt_201202240659_6432_r_000000_0'
> >>>> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip
> >>>> task_201202240659_6432_r_000000, for tracker
> >>>> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
> >>>> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
> >>>> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
> >>>> The temporary job-output directory
> >>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> >>>> doesn't exist!
> >>>>     at
> >>>
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >>>>     at
> >>>
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
> >>>>     at
> >>>
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >>>>     at
> >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
> >>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>>     at
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >>>>
> >>>> 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Removing task 'attempt_201202240659_6432_r_000000_1'
> >>>> 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip
> >>>> task_201202240659_6432_r_000000, for tracker
> >>>> 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
> >>>> 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
> >>>> Error from attempt_201202240659_6432_r_000000_2: java.io.IOException:
> >>>> The temporary job-output directory
> >>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> >>>> doesn't exist!
> >>>>     at
> >>>
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >>>>     at
> >>>
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
> >>>>     at
> >>>
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >>>>     at
> >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
> >>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>>     at
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >>>>
> >>>> 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Removing task 'attempt_201202240659_6432_r_000000_2'
> >>>> 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip
> >>>> task_201202240659_6432_r_000000, for tracker
> >>>> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> >>>> 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
> >>>> Error from attempt_201202240659_6432_r_000000_3: java.io.IOException:
> >>>> The temporary job-output directory
> >>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> >>>> doesn't exist!
> >>>>     at
> >>>
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >>>>     at
> >>>
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
> >>>>     at
> >>>
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >>>>     at
> >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
> >>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>>     at
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >>>>
> >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress:
> >>>> TaskInProgress task_201202240659_6432_r_000000 has failed 4 times.
> >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
> >>>> Aborting job job_201202240659_6432
> >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
> >>>> Killing job 'job_201202240659_6432'
> >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to
> >>>> tip task_201202240659_6432_m_000002, for tracker
> >>>> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> >>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Removing task 'attempt_201202240659_6432_r_000000_3'
> >>>> 2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress:
> >>>> Task 'attempt_201202240659_6432_m_000002_0' has completed
> >>>> task_201202240659_6432_m_000002 successfully.
> >>>> 2012-05-02 17:01:22,443 INFO
> >>>> org.apache.hadoop.mapred.JobInProgress$JobSummary:
> >>>>
> >>>
> jobId=job_201202240659_6432,submitTime=1335990629637,launchTime=1335990629702,firstMapTaskLaunchTime=1335990631252,firstReduceTaskLaunchTime=1335990633988,firstJobSetupTaskLaunchTime=1335990629727,firstJobCleanupTaskLaunchTime=1335992481228,finishTime=1335992482443,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=thompson,queue=default,status=FAILED,mapSlotSeconds=7,reduceSlotsSeconds=32,clusterMapCapacity=138,clusterReduceCapacity=138
> >>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient:
> >>>> DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
> >>>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> >>>> on
> >>>
> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> >>>> File does not exist. Holder DFSClient_1931891231 does not have any
> >>>> open files.
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
> >>>>     at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
> >>>>     at
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>>     at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> >>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
> >>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
> >>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>>     at
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
> >>>>
> >>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1104)
> >>>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
> >>>>     at $Proxy4.addBlock(Unknown Source)
> >>>>     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> >>>>     at
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>>     at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>     at
> >>>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >>>>     at
> >>>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >>>>     at $Proxy4.addBlock(Unknown Source)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
> >>>>
> >>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >>>> Recovery for block null bad datanode[0] nodes == null
> >>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Could
> >>>> not get block locations. Source file
> >>>>
> >>>
> "/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity"
> >>>> - Aborting...
> >>>> 2012-05-02 17:01:22,446 INFO org.apache.hadoop.mapred.JobHistory:
> >>>> Moving
> >>>
> file:/var/log/hadoop/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> >>>> to file:/var/log/hadoop/history/done
> >>>> 2012-05-02 17:01:22,447 ERROR org.apache.hadoop.hdfs.DFSClient:
> >>>> Exception closing file
> >>>>
> >>>
> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> >>>> : org.apache.hadoop.ipc.RemoteException:
> >>>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> >>>> on
> >>>
> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> >>>> File does not exist. Holder DFSClient_1931891231 does not have any
> >>>> open files.
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
> >>>>     at
> >>>
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
> >>>>     at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
> >>>>     at
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>>     at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> >>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
> >>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
> >>>>     at java.security.AccessController.doPrivileged(Native Method)
> >>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>>     at
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
> >>>>
> >>>>
> >>>> Can anyone explain to me what is causing the problem?
> >>>>
> >>>> --
> >>>> *Keith Thompson*
> >>>
> >>
> >>
> >>
> >> --
> >> *Keith Thompson*
> >> Graduate Research Associate, Xerox Corporation
> >> SUNY Research Foundation
> >> Dept. of Systems Science and Industrial Engineering
> >> Binghamton University
> >> Binghamton, NY 13902
> >>
> >>
>
>


-- 
*Keith Thompson*
Graduate Research Associate, Xerox Corporation
SUNY Research Foundation
Dept. of Systems Science and Industrial Engineering
Binghamton University
Binghamton, NY 13902

Re: Reduce Hangs at 66%

Posted by Michael Segel <mi...@hotmail.com>.
Well 
That was one of the things I had asked. 
ulimit -a says it all. 

But you have to do this for the users... hdfs, mapred, and hadoop

(Which is why I asked about which flavor.)

On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote:

> Keith
> 
> What is the the output for ulimit -n? Your value for number of open files is probably too low.
> 
> Raj
> 
> 
> 
> 
>> ________________________________
>> From: Keith Thompson <kt...@binghamton.edu>
>> To: common-user@hadoop.apache.org 
>> Sent: Thursday, May 3, 2012 4:33 PM
>> Subject: Re: Reduce Hangs at 66%
>> 
>> I am not sure about ulimits, but I can answer the rest. It's a Cloudera
>> distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
>> I am taking keys in the form of (gridID, date), each with a value of 1. The
>> reduce step just sums the 1's as the final output value for the key (It's
>> counting how many people were in the gridID on a certain day).
>> 
>> I have been running other more complicated jobs with no problem, so I'm not
>> sure why this one is being peculiar. This is the code I used to execute the
>> program from the command line (the source is a file on the hdfs):
>> 
>> hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1
>> 
>> The program then executes the map and gets to 66% of the reduce, then stops
>> responding. The cause of the error seems to be:
>> 
>> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
>>> The temporary job-output directory
>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>>> doesn't exist!
>> 
>> I don't understand what the _temporary is. I am assuming it's something
>> Hadoop creates automatically.
>> 
>> 
>> 
>> On Thu, May 3, 2012 at 5:02 AM, Michel Segel <mi...@hotmail.com>wrote:
>> 
>>> Well...
>>> Lots of information but still missing some of the basics...
>>> 
>>> Which release and version?
>>> What are your ulimits set to?
>>> How much free disk space do you have?
>>> What are you attempting to do?
>>> 
>>> Stuff like that.
>>> 
>>> 
>>> 
>>> Sent from a remote device. Please excuse any typos...
>>> 
>>> Mike Segel
>>> 
>>> On May 2, 2012, at 4:49 PM, Keith Thompson <kt...@binghamton.edu>
>>> wrote:
>>> 
>>>> I am running a task which gets to 66% of the Reduce step and then hangs
>>>> indefinitely. Here is the log file (I apologize if I am putting too much
>>>> here but I am not exactly sure what is relevant):
>>>> 
>>>> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip
>>>> task_201202240659_6433_r_000000, for tracker
>>>> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
>>>> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
>>>> Task 'attempt_201202240659_6433_m_000001_0' has completed
>>>> task_201202240659_6433_m_000001 successfully.
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201202240659_6432_r_000000_0: Task
>>>> attempt_201202240659_6432_r_000000_0 failed to report status for 1800
>>>> seconds. Killing!
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_000000_0'
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to
>>>> tip task_201202240659_6432_r_000000, for tracker
>>>> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
>>>> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_000000_0'
>>>> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip
>>>> task_201202240659_6432_r_000000, for tracker
>>>> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
>>>> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
>>>> The temporary job-output directory
>>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>>>> doesn't exist!
>>>>     at
>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>>>     at
>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>>>>     at
>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>>>     at
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>> 
>>>> 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_000000_1'
>>>> 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip
>>>> task_201202240659_6432_r_000000, for tracker
>>>> 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
>>>> 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201202240659_6432_r_000000_2: java.io.IOException:
>>>> The temporary job-output directory
>>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>>>> doesn't exist!
>>>>     at
>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>>>     at
>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>>>>     at
>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>>>     at
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>> 
>>>> 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_000000_2'
>>>> 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip
>>>> task_201202240659_6432_r_000000, for tracker
>>>> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
>>>> 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201202240659_6432_r_000000_3: java.io.IOException:
>>>> The temporary job-output directory
>>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>>>> doesn't exist!
>>>>     at
>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>>>     at
>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>>>>     at
>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>>>     at
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>> 
>>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> TaskInProgress task_201202240659_6432_r_000000 has failed 4 times.
>>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
>>>> Aborting job job_201202240659_6432
>>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
>>>> Killing job 'job_201202240659_6432'
>>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to
>>>> tip task_201202240659_6432_m_000002, for tracker
>>>> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
>>>> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_000000_3'
>>>> 2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress:
>>>> Task 'attempt_201202240659_6432_m_000002_0' has completed
>>>> task_201202240659_6432_m_000002 successfully.
>>>> 2012-05-02 17:01:22,443 INFO
>>>> org.apache.hadoop.mapred.JobInProgress$JobSummary:
>>>> 
>>> jobId=job_201202240659_6432,submitTime=1335990629637,launchTime=1335990629702,firstMapTaskLaunchTime=1335990631252,firstReduceTaskLaunchTime=1335990633988,firstJobSetupTaskLaunchTime=1335990629727,firstJobCleanupTaskLaunchTime=1335992481228,finishTime=1335992482443,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=thompson,queue=default,status=FAILED,mapSlotSeconds=7,reduceSlotsSeconds=32,clusterMapCapacity=138,clusterReduceCapacity=138
>>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient:
>>>> DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
>>>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
>>>> on
>>> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
>>>> File does not exist. Holder DFSClient_1931891231 does not have any
>>>> open files.
>>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
>>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
>>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
>>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>>>     at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>>> 
>>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>>>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>>>>     at $Proxy4.addBlock(Unknown Source)
>>>>     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>>>     at
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>>>     at $Proxy4.addBlock(Unknown Source)
>>>>     at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>>>>     at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>>>>     at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>>>>     at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
>>>> 
>>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Error
>>>> Recovery for block null bad datanode[0] nodes == null
>>>> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Could
>>>> not get block locations. Source file
>>>> 
>>> "/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity"
>>>> - Aborting...
>>>> 2012-05-02 17:01:22,446 INFO org.apache.hadoop.mapred.JobHistory:
>>>> Moving
>>> file:/var/log/hadoop/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
>>>> to file:/var/log/hadoop/history/done
>>>> 2012-05-02 17:01:22,447 ERROR org.apache.hadoop.hdfs.DFSClient:
>>>> Exception closing file
>>>> 
>>> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
>>>> : org.apache.hadoop.ipc.RemoteException:
>>>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
>>>> on
>>> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
>>>> File does not exist. Holder DFSClient_1931891231 does not have any
>>>> open files.
>>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
>>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
>>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
>>>>     at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>>>>     at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>>>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>>>> 
>>>> 
>>>> Can anyone explain to me what is causing the problem?
>>>> 
>>>> --
>>>> *Keith Thompson*
>>> 
>> 
>> 
>> 
>> -- 
>> *Keith Thompson*
>> Graduate Research Associate, Xerox Corporation
>> SUNY Research Foundation
>> Dept. of Systems Science and Industrial Engineering
>> Binghamton University
>> Binghamton, NY 13902
>> 
>> 


Re: Reduce Hangs at 66%

Posted by Raj Vishwanathan <ra...@yahoo.com>.
Keith

What is the the output for ulimit -n? Your value for number of open files is probably too low.

Raj




>________________________________
> From: Keith Thompson <kt...@binghamton.edu>
>To: common-user@hadoop.apache.org 
>Sent: Thursday, May 3, 2012 4:33 PM
>Subject: Re: Reduce Hangs at 66%
> 
>I am not sure about ulimits, but I can answer the rest. It's a Cloudera
>distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
>I am taking keys in the form of (gridID, date), each with a value of 1. The
>reduce step just sums the 1's as the final output value for the key (It's
>counting how many people were in the gridID on a certain day).
>
>I have been running other more complicated jobs with no problem, so I'm not
>sure why this one is being peculiar. This is the code I used to execute the
>program from the command line (the source is a file on the hdfs):
>
>hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1
>
>The program then executes the map and gets to 66% of the reduce, then stops
>responding. The cause of the error seems to be:
>
>Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
>> The temporary job-output directory
>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>> doesn't exist!
>
>I don't understand what the _temporary is. I am assuming it's something
>Hadoop creates automatically.
>
>
>
>On Thu, May 3, 2012 at 5:02 AM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Well...
>> Lots of information but still missing some of the basics...
>>
>> Which release and version?
>> What are your ulimits set to?
>> How much free disk space do you have?
>> What are you attempting to do?
>>
>> Stuff like that.
>>
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On May 2, 2012, at 4:49 PM, Keith Thompson <kt...@binghamton.edu>
>> wrote:
>>
>> > I am running a task which gets to 66% of the Reduce step and then hangs
>> > indefinitely. Here is the log file (I apologize if I am putting too much
>> > here but I am not exactly sure what is relevant):
>> >
>> > 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip
>> > task_201202240659_6433_r_000000, for tracker
>> > 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
>> > 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Task 'attempt_201202240659_6433_m_000001_0' has completed
>> > task_201202240659_6433_m_000001 successfully.
>> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
>> > Error from attempt_201202240659_6432_r_000000_0: Task
>> > attempt_201202240659_6432_r_000000_0 failed to report status for 1800
>> > seconds. Killing!
>> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>> > Removing task 'attempt_201202240659_6432_r_000000_0'
>> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to
>> > tip task_201202240659_6432_r_000000, for tracker
>> > 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
>> > 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
>> > Removing task 'attempt_201202240659_6432_r_000000_0'
>> > 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip
>> > task_201202240659_6432_r_000000, for tracker
>> > 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
>> > 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
>> > Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
>> > The temporary job-output directory
>> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>> > doesn't exist!
>> >    at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>> >    at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>> >    at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>> >    at
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>> >    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> >    at java.security.AccessController.doPrivileged(Native Method)
>> >    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >    at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> >
>> > 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
>> > Removing task 'attempt_201202240659_6432_r_000000_1'
>> > 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip
>> > task_201202240659_6432_r_000000, for tracker
>> > 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
>> > 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
>> > Error from attempt_201202240659_6432_r_000000_2: java.io.IOException:
>> > The temporary job-output directory
>> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>> > doesn't exist!
>> >    at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>> >    at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>> >    at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>> >    at
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>> >    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> >    at java.security.AccessController.doPrivileged(Native Method)
>> >    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >    at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> >
>> > 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
>> > Removing task 'attempt_201202240659_6432_r_000000_2'
>> > 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip
>> > task_201202240659_6432_r_000000, for tracker
>> > 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
>> > 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
>> > Error from attempt_201202240659_6432_r_000000_3: java.io.IOException:
>> > The temporary job-output directory
>> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>> > doesn't exist!
>> >    at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>> >    at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>> >    at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>> >    at
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>> >    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> >    at java.security.AccessController.doPrivileged(Native Method)
>> >    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >    at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> >
>> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress:
>> > TaskInProgress task_201202240659_6432_r_000000 has failed 4 times.
>> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Aborting job job_201202240659_6432
>> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Killing job 'job_201202240659_6432'
>> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to
>> > tip task_201202240659_6432_m_000002, for tracker
>> > 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
>> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
>> > Removing task 'attempt_201202240659_6432_r_000000_3'
>> > 2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Task 'attempt_201202240659_6432_m_000002_0' has completed
>> > task_201202240659_6432_m_000002 successfully.
>> > 2012-05-02 17:01:22,443 INFO
>> > org.apache.hadoop.mapred.JobInProgress$JobSummary:
>> >
>> jobId=job_201202240659_6432,submitTime=1335990629637,launchTime=1335990629702,firstMapTaskLaunchTime=1335990631252,firstReduceTaskLaunchTime=1335990633988,firstJobSetupTaskLaunchTime=1335990629727,firstJobCleanupTaskLaunchTime=1335992481228,finishTime=1335992482443,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=thompson,queue=default,status=FAILED,mapSlotSeconds=7,reduceSlotsSeconds=32,clusterMapCapacity=138,clusterReduceCapacity=138
>> > 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient:
>> > DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
>> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
>> > on
>> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
>> > File does not exist. Holder DFSClient_1931891231 does not have any
>> > open files.
>> >    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
>> >    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
>> >    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
>> >    at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>> >    at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>> >    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >    at java.lang.reflect.Method.invoke(Method.java:597)
>> >    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>> >    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>> >    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>> >    at java.security.AccessController.doPrivileged(Native Method)
>> >    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>> >
>> >    at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>> >    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>> >    at $Proxy4.addBlock(Unknown Source)
>> >    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>> >    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >    at java.lang.reflect.Method.invoke(Method.java:597)
>> >    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>> >    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>> >    at $Proxy4.addBlock(Unknown Source)
>> >    at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>> >    at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>> >    at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>> >    at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
>> >
>> > 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > Recovery for block null bad datanode[0] nodes == null
>> > 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Could
>> > not get block locations. Source file
>> >
>> "/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity"
>> > - Aborting...
>> > 2012-05-02 17:01:22,446 INFO org.apache.hadoop.mapred.JobHistory:
>> > Moving
>> file:/var/log/hadoop/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
>> > to file:/var/log/hadoop/history/done
>> > 2012-05-02 17:01:22,447 ERROR org.apache.hadoop.hdfs.DFSClient:
>> > Exception closing file
>> >
>> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
>> > : org.apache.hadoop.ipc.RemoteException:
>> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
>> > on
>> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
>> > File does not exist. Holder DFSClient_1931891231 does not have any
>> > open files.
>> >    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
>> >    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
>> >    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
>> >    at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>> >    at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>> >    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >    at java.lang.reflect.Method.invoke(Method.java:597)
>> >    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>> >    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>> >    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>> >    at java.security.AccessController.doPrivileged(Native Method)
>> >    at javax.security.auth.Subject.doAs(Subject.java:396)
>> >    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>> >
>> >
>> > Can anyone explain to me what is causing the problem?
>> >
>> > --
>> > *Keith Thompson*
>>
>
>
>
>-- 
>*Keith Thompson*
>Graduate Research Associate, Xerox Corporation
>SUNY Research Foundation
>Dept. of Systems Science and Industrial Engineering
>Binghamton University
>Binghamton, NY 13902
>
>
>

Re: Reduce Hangs at 66%

Posted by Keith Thompson <kt...@binghamton.edu>.
I am not sure about ulimits, but I can answer the rest. It's a Cloudera
distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
I am taking keys in the form of (gridID, date), each with a value of 1. The
reduce step just sums the 1's as the final output value for the key (It's
counting how many people were in the gridID on a certain day).

I have been running other more complicated jobs with no problem, so I'm not
sure why this one is being peculiar. This is the code I used to execute the
program from the command line (the source is a file on the hdfs):

hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1

The program then executes the map and gets to 66% of the reduce, then stops
responding. The cause of the error seems to be:

Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!

I don't understand what the _temporary is. I am assuming it's something
Hadoop creates automatically.



On Thu, May 3, 2012 at 5:02 AM, Michel Segel <mi...@hotmail.com>wrote:

> Well...
> Lots of information but still missing some of the basics...
>
> Which release and version?
> What are your ulimits set to?
> How much free disk space do you have?
> What are you attempting to do?
>
> Stuff like that.
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 2, 2012, at 4:49 PM, Keith Thompson <kt...@binghamton.edu>
> wrote:
>
> > I am running a task which gets to 66% of the Reduce step and then hangs
> > indefinitely. Here is the log file (I apologize if I am putting too much
> > here but I am not exactly sure what is relevant):
> >
> > 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip
> > task_201202240659_6433_r_000000, for tracker
> > 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
> > 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
> > Task 'attempt_201202240659_6433_m_000001_0' has completed
> > task_201202240659_6433_m_000001 successfully.
> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from attempt_201202240659_6432_r_000000_0: Task
> > attempt_201202240659_6432_r_000000_0 failed to report status for 1800
> > seconds. Killing!
> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> > Removing task 'attempt_201202240659_6432_r_000000_0'
> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to
> > tip task_201202240659_6432_r_000000, for tracker
> > 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
> > 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
> > Removing task 'attempt_201202240659_6432_r_000000_0'
> > 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip
> > task_201202240659_6432_r_000000, for tracker
> > 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
> > 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
> > The temporary job-output directory
> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> > doesn't exist!
> >    at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >    at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
> >    at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >    at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
> >    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >
> > 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
> > Removing task 'attempt_201202240659_6432_r_000000_1'
> > 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip
> > task_201202240659_6432_r_000000, for tracker
> > 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
> > 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from attempt_201202240659_6432_r_000000_2: java.io.IOException:
> > The temporary job-output directory
> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> > doesn't exist!
> >    at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >    at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
> >    at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >    at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
> >    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >
> > 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
> > Removing task 'attempt_201202240659_6432_r_000000_2'
> > 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip
> > task_201202240659_6432_r_000000, for tracker
> > 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> > 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from attempt_201202240659_6432_r_000000_3: java.io.IOException:
> > The temporary job-output directory
> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> > doesn't exist!
> >    at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >    at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
> >    at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >    at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
> >    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >
> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress:
> > TaskInProgress task_201202240659_6432_r_000000 has failed 4 times.
> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
> > Aborting job job_201202240659_6432
> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
> > Killing job 'job_201202240659_6432'
> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to
> > tip task_201202240659_6432_m_000002, for tracker
> > 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> > 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
> > Removing task 'attempt_201202240659_6432_r_000000_3'
> > 2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress:
> > Task 'attempt_201202240659_6432_m_000002_0' has completed
> > task_201202240659_6432_m_000002 successfully.
> > 2012-05-02 17:01:22,443 INFO
> > org.apache.hadoop.mapred.JobInProgress$JobSummary:
> >
> jobId=job_201202240659_6432,submitTime=1335990629637,launchTime=1335990629702,firstMapTaskLaunchTime=1335990631252,firstReduceTaskLaunchTime=1335990633988,firstJobSetupTaskLaunchTime=1335990629727,firstJobCleanupTaskLaunchTime=1335992481228,finishTime=1335992482443,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=thompson,queue=default,status=FAILED,mapSlotSeconds=7,reduceSlotsSeconds=32,clusterMapCapacity=138,clusterReduceCapacity=138
> > 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient:
> > DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> > on
> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> > File does not exist. Holder DFSClient_1931891231 does not have any
> > open files.
> >    at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
> >    at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
> >    at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
> >    at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
> >    at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
> >    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> >    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
> >    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
> >
> >    at org.apache.hadoop.ipc.Client.call(Client.java:1104)
> >    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
> >    at $Proxy4.addBlock(Unknown Source)
> >    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> >    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >    at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >    at $Proxy4.addBlock(Unknown Source)
> >    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
> >    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
> >    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
> >    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
> >
> > 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block null bad datanode[0] nodes == null
> > 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Could
> > not get block locations. Source file
> >
> "/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity"
> > - Aborting...
> > 2012-05-02 17:01:22,446 INFO org.apache.hadoop.mapred.JobHistory:
> > Moving
> file:/var/log/hadoop/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> > to file:/var/log/hadoop/history/done
> > 2012-05-02 17:01:22,447 ERROR org.apache.hadoop.hdfs.DFSClient:
> > Exception closing file
> >
> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> > : org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> > on
> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> > File does not exist. Holder DFSClient_1931891231 does not have any
> > open files.
> >    at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
> >    at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
> >    at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
> >    at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
> >    at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
> >    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> >    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
> >    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
> >
> >
> > Can anyone explain to me what is causing the problem?
> >
> > --
> > *Keith Thompson*
>



-- 
*Keith Thompson*
Graduate Research Associate, Xerox Corporation
SUNY Research Foundation
Dept. of Systems Science and Industrial Engineering
Binghamton University
Binghamton, NY 13902

Re: Reduce Hangs at 66%

Posted by Michel Segel <mi...@hotmail.com>.
Well...
Lots of information but still missing some of the basics...

Which release and version?
What are your ulimits set to?
How much free disk space do you have?
What are you attempting to do?

Stuff like that.



Sent from a remote device. Please excuse any typos...

Mike Segel

On May 2, 2012, at 4:49 PM, Keith Thompson <kt...@binghamton.edu> wrote:

> I am running a task which gets to 66% of the Reduce step and then hangs
> indefinitely. Here is the log file (I apologize if I am putting too much
> here but I am not exactly sure what is relevant):
> 
> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip
> task_201202240659_6433_r_000000, for tracker
> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_201202240659_6433_m_000001_0' has completed
> task_201202240659_6433_m_000001 successfully.
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_000000_0: Task
> attempt_201202240659_6432_r_000000_0 failed to report status for 1800
> seconds. Killing!
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_0'
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to
> tip task_201202240659_6432_r_000000, for tracker
> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_0'
> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip
> task_201202240659_6432_r_000000, for tracker
> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!
>    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_1'
> 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip
> task_201202240659_6432_r_000000, for tracker
> 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
> 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_000000_2: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!
>    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_2'
> 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip
> task_201202240659_6432_r_000000, for tracker
> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_000000_3: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!
>    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress:
> TaskInProgress task_201202240659_6432_r_000000 has failed 4 times.
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
> Aborting job job_201202240659_6432
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress:
> Killing job 'job_201202240659_6432'
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to
> tip task_201202240659_6432_m_000002, for tracker
> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_000000_3'
> 2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_201202240659_6432_m_000002_0' has completed
> task_201202240659_6432_m_000002 successfully.
> 2012-05-02 17:01:22,443 INFO
> org.apache.hadoop.mapred.JobInProgress$JobSummary:
> jobId=job_201202240659_6432,submitTime=1335990629637,launchTime=1335990629702,firstMapTaskLaunchTime=1335990631252,firstReduceTaskLaunchTime=1335990633988,firstJobSetupTaskLaunchTime=1335990629727,firstJobCleanupTaskLaunchTime=1335992481228,finishTime=1335992482443,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=thompson,queue=default,status=FAILED,mapSlotSeconds=7,reduceSlotsSeconds=32,clusterMapCapacity=138,clusterReduceCapacity=138
> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> on /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> File does not exist. Holder DFSClient_1931891231 does not have any
> open files.
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
>    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>    at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
> 
>    at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>    at $Proxy4.addBlock(Unknown Source)
>    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>    at $Proxy4.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
> 
> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2012-05-02 17:01:22,446 WARN org.apache.hadoop.hdfs.DFSClient: Could
> not get block locations. Source file
> "/thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity"
> - Aborting...
> 2012-05-02 17:01:22,446 INFO org.apache.hadoop.mapred.JobHistory:
> Moving file:/var/log/hadoop/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> to file:/var/log/hadoop/history/done
> 2012-05-02 17:01:22,447 ERROR org.apache.hadoop.hdfs.DFSClient:
> Exception closing file
> /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> : org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
> on /thompson/outputDensity/density1/_logs/history/analytix1_1330084785363_job_201202240659_6432_thompson_TwitterDensity
> File does not exist. Holder DFSClient_1931891231 does not have any
> open files.
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
>    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>    at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
> 
> 
> Can anyone explain to me what is causing the problem?
> 
> -- 
> *Keith Thompson*