You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Dan Bretherton <da...@mail.nerc-essc.ac.uk> on 2006/03/30 21:13:12 UTC

Hadoop evaluation: Reliability issues and NFS

Dear Hadoop users and developers,

I have been evaluating Hadoop with a view to using it as a distributed
computing platform for performing fast operations on locally stored data. I
have found several problems that are causing Hadoop to be unreliable on our
network of 13 Sun Sparcs and 9 Linux desktop PCs. I am running a simple test
application that searches through an environmental data set containing 1800
files of 22MB each, looking for files containing temperatures above a certain
level. We do not need to use the distributed filesystem in Hadoop because
the data and home directories are available on every machine via NFS. I have
been running the test repeatedly to find out the average run times for
different numbers of distributed map operations, but I have not managed to
run my test program to completion enough times get any indication of how much
faster it runs with multiple maps. I found that reliability was much
improved when I scaled down the problem to a search through 250 files instead
of 1800, but for this relatively small task there was no speed advantage in
using Hadoop at all. This means that we may not be investigating Hadoop any
further for use with our applications, but I thought that it might be helpful
to post the results of my evaluation on the mailing list. Here are the
details of the specific problems that I identified during my tests of the
nightly build from the 27th March.

The first problem I have is that task tracker processes often die while I am
running my test program. Typically, the number of slave nodes that can be
seen by the job tracker gradually decreases over time, and when I check the
nodes that have disappeared I find that the task tracker process has stopped
running on that machine. The resulting error message in the task tracker log
file for the disappearing nodes begins with the following lines:

060330 130100 task_m_f8jt6q SEVERE FSError from child
060330 130100 task_m_f8jt6q org.apache.hadoop.fs.FSError: java.io.IOException:
Stale NFS file handle
060330 130100 task_m_f8jt6q at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFileSystem.java:143)

When I am not running anything the task trackers happily sit there for hours
without expiring.

The second problem is that a lot of jobs fail due to failed map and reduce
operations. Map tasks usually fail with either of the two following errors:

java.io.IOException: Task process exit with nonzero status. at
org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273) at ...etc.

... or ...

java.io.FileNotFoundException: /users/dab/Hadoop/input/Occam/.nfs000047ab00000001
at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:114)
at ....etc.

.... or occasionally ......

java.lang.ClassFormatError: mapreducetest/Main$MapClass (Truncated class file)
at java.lang.ClassLoader.defineClass0(Native Method) at ....etc.

Reduce tasks usually fail with this error:

java.lang.NullPointerException at java.net.Socket.(Socket.java:301) at
java.net.Socket.(Socket.java:153) at org.apache.hadoop.ipc.Client$Connection.
(Client.java:114) at ...etc.

I don't think this is related to the problem of disappearing nodes, because
tasks often fail during periods when no nodes are being lost. I have been
running with one reduce task per job.

The third problem is related to the Hadoop distributed filesystem (DFS). I
tried running in DFS mode to see if this improved reliability, though this
was not a true test of the DFS because of our NFS setup (i.e. all the DFS
blocks actually end up in my home directory on a single disk). I should also
point out that the input data involved in the DFS was just a list of file
names, not the temperature data itself. Using the DFS I found that the jobs
often failed because of problems with missing blocks of data. Here is a
typical error message from the job tracker log file.

java.io.IOException: Could not obtain block blk_-3035035931951255964 at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:362)
at ...etc.

As soon as these errors start to appear it means that the DFS is broken.
Performing a “get” on the input data directory gives errors like the
following:

060330 193615 No node available for block blk_-4259713996952710409
060330 193615 Could not obtain block from any node: java.io.IOException: No
live nodes contain current block

The only solution I found was to delete the input directory from the DFS and
then put it back, after which one or two jobs would run to completion before
the same thing happened again. It occurred to me that this problem might be
something to do with the fact that I was still using NFS when running in DFS
mode, obviously not what the DFS was designed for. No task trackers died
while I was running in DFS mode, but that may be just a coincidence.

Another factor that affects reliability is the number of maps. I never
managed to complete a job with more than one map per node, but my experience
with another distributed computing platform that I have been evaluating, the
Distributed Parallel Processing Environment for Java (DPPEJ) from IBM
(http://www.alphaworks.ibm.com/tech/dppej), showed that there is no advantage
to having more than a handful of distributed processes simultaneously trying
to access data on the same disk. Therefore, the maximum number of maps I
used is one for each of our 13 Suns.

I don't think that there is anything particularly unusual or unreliable about
our computers or our network, although I expect that my test application is
not what Hadoop was really intended for. When I scaled down my test problem
to improve reliability I found that Hadoop did not reduce the average run
time very much at all, probably because the overheads involved in task
scheduling and tracking outweighed the benefits of parallelisation. IBM's
DPPEJ runs my test application well and reduces average run times by about a
factor of five.

I would be interested to hear any comments or suggestions relating to any of
these issues, and I will be happy to supply any further details requested.

Regards,

--
Dan Bretherton
Environmental Systems Science Centre
Harry Pitt Building
3 Earley Gate
Reading University
Reading, RG6 6AL
UK

Tel. +44 118 378 7722

Re: Hadoop evaluation: Reliability issues and NFS

Posted by Doug Cutting <cu...@apache.org>.

Dan Bretherton wrote:
> We do not need to use the distributed filesystem in Hadoop because 
> the data and home directories are available on every machine via NFS.

Writing everything over NFS will seriously affect Hadoop's performance, 
and is not recommended.

> 060330 130100 task_m_f8jt6q  SEVERE FSError from child
> 060330 130100 task_m_f8jt6q org.apache.hadoop.fs.FSError: java.io.IOException: 
> Stale NFS file handle

This looks related to your use of NFS.

> java.io.IOException: Task process exit with nonzero status. at 
> org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273) at ...etc.

That means the JVM running your task crashed.  Enabling core dumps might 
help you figure out why it crashed.

> java.io.FileNotFoundException: /users/dab/Hadoop/input/Occam/.nfs000047ab00000001 
> at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:114) 
> at ....etc.

This looks like another NFS-related problem.

> though this 
> was not a true test of the DFS because of our NFS setup (i.e. all the DFS 
> blocks actually end up in my home directory on a single disk).  I should also 
> point out that the input data involved in the DFS was just a list of file 
> names, not the temperature data itself.  Using the DFS I found that the jobs 
> often failed because of problems with missing blocks of data.  Here is a 
> typical error message from the job tracker log file.
> 
> java.io.IOException: Could not obtain block blk_-3035035931951255964 at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:362) 
> at ...etc.
> 
> As soon as these errors start to appear it means that the DFS is broken.  

This could be related to running DFS on top of NFS.  Again, I would not 
recommend that.

Generally I would try running things without using NFS, using local 
volumes for all mapred and dfs directories.  That is the intended use.

Doug