You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by himanshu chandola <hi...@yahoo.com> on 2010/02/16 22:32:48 UTC

fs errors in reduce

Hi ,
I'm struggling with an error while running hadoop and haven't been able to find a solution to it. At the end of the map phase, all the reduces get stuck and fail at 0%. Some fail with this message:

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_201002151946_0001_r_000001_2/intermediate.13
	at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313)
	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
...

Reducers on some of the nodes give this error on failing:
java.io.IOException: All datanodes 10.42.255.203:50010 are bad. Aborting...
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2168)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
...
For the first error, I checked whether the directory 'attempt_*' existed. It did but the file intermediate.13 didn't exist (The 
intermediate files were only upto intermediate.12). 

I also checked the fs health and it looks good.
I also tried restarting hadoop and it restarts without any errors. The nodes have sufficient free space so that couldn't be a problem as well.


Please give me some suggestions if you have any ideas.

Thanks

 H
 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.