You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Marcin Sieniek <ms...@wirtualne-tatry.pl> on 2010/05/26 18:40:21 UTC

"IOException: Filesystem closed." when trying to commit reduce output.

Hi there,

I've got a simple Map Reduce application that works perfectly when I use 
NFS as an underlying filesystem (not using HDFS at all).
I've got a working HDFS configuration as well - grep example works for 
me with this configuration.

However, when I try to run the same application on HDFS instead of NFS I 
keep recieving "IOException: Filesystem closed." exception and the job 
fails.
I've spent a day searching for a solution with Google and scanning thru 
old archieves but no results so far...

Job summary is:
--->output
10/05/26 17:29:13 INFO mapred.JobClient: Job complete: job_201005261710_0002
10/05/26 17:29:13 INFO mapred.JobClient: Counters: 4
10/05/26 17:29:13 INFO mapred.JobClient:   Job Counters
10/05/26 17:29:13 INFO mapred.JobClient:     Rack-local map tasks=12
10/05/26 17:29:13 INFO mapred.JobClient:     Launched map tasks=16
10/05/26 17:29:13 INFO mapred.JobClient:     Data-local map tasks=4
10/05/26 17:29:13 INFO mapred.JobClient:     Failed map tasks=1

Each map task's attempt log reads somehow like:
--->attempt_201005261710_0001_m_000000_3/syslog:
2010-05-26 17:13:47,297 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, session
Id=
2010-05-26 17:13:47,470 INFO org.apache.hadoop.mapred.MapTask: 
io.sort.mb = 100
2010-05-26 17:13:47,688 INFO org.apache.hadoop.mapred.MapTask: data 
buffer = 79691776/99614720
2010-05-26 17:13:47,688 INFO org.apache.hadoop.mapred.MapTask: record 
buffer = 262144/327680
2010-05-26 17:13:47,712 INFO org.apache.hadoop.mapred.MapTask: Starting 
flush of map output
2010-05-26 17:13:47,784 INFO org.apache.hadoop.mapred.MapTask: Finished 
spill 0
2010-05-26 17:13:47,788 INFO org.apache.hadoop.mapred.TaskRunner: 
Task:attempt_201005261710_0001_m_000000_3 is done. And is i
n the process of commiting
2010-05-26 17:13:47,797 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
*java.io.IOException: Filesystem closed
         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:617)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
         at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.needsTaskCommit(FileOutputCommitter.java:217)
         at org.apache.hadoop.mapred.Task.done(Task.java:671)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309)
         at org.apache.hadoop.mapred.Child.main(Child.java:170)*
2010-05-26 17:13:47,802 INFO org.apache.hadoop.mapred.TaskRunner: 
Runnning cleanup for the task
2010-05-26 17:13:47,802 WARN 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Error 
discarding output*
java.io.IOException: Filesystem closed
         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
         at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:580)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:227)
         at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.abortTask(FileOutputCommitter.java:179)
         at org.apache.hadoop.mapred.Task.taskCleanup(Task.java:815)
         at org.apache.hadoop.mapred.Child.main(Child.java:191)*

There are no reduce task run, as map tasks haven't managed to save their 
solution.

This exceptions are visible in JobTracker's log as well. What is the 
reason for this excpetion? Is it critical (I guess it is, but it's 
listed in JobTracker's log as INFO not ERROR).

My config (I'm not sure which directories should be local and which 
located on HDFS, maybe the issue is somewhere here?):

---->core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://blade02:5432/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop/tmp</value> <!-- local -->
</property>

</configuration>

---->hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/tmp/hadoop/name2</value> <!-- local dir where HDFS is located-->
</property>
<property>
<name>dfs.data.dir</name>
<value>/tmp/hadoop/data</value> <!-- local dir where HDFS is located -->
</property>
</configuration>

---->mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>blade02:5435</value>
</property>
<property>
<name>mapred.temp.dir</name>
<value>mapred_tmp</value> <!-- on HDFS I suppose -->
</property>
<property>
<name>mapred.system.dir</name>
<value>system</value> <!-- on HDFS I suppose -->
</property>
<property>
<name>mapred.local.dir</name>
<value>/tmp/hadoop/local</value> <!-- local -->
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:0</value>
</property>
<property>
<name>mapred.textoutputformat.separator</name>
<value>,</value>
</property>
</configuration>

I'm using Hadoop 0.20.2 (new API -> org.apache.hadoop.mapreduce.*, 
default OutputFormat and RecordWriter), running on a 3-node cluster 
(blade02, blade03, blade04). blade02 is a master, all of them are 
slaves. My OS: Linux blade02 2.6.9-42.0.2.ELsmp #1 SMP Tue Aug 22 
17:26:55 CDT 2006 i686 i686 i386 GNU/Linux.

Note that there are currently 3 filesystems in my configuration:
/tmp/* - is a local fs for each processor
/home/* - as the NFS common for all processors    - this is where the 
hadoop is installed
hdfs://blade02:5432/* - HDFS

I'm not sure if this is relevant, but intermediate (key, value) pair is 
of type (Text, TermVector), and TermVector Writable methods are 
implemented like this:
         public class TermVector implements Writable {
                 private Map<Text, IntWritable> vec = new HashMap<Text, 
IntWritable>();

                 @Override
                 public void write(DataOutput out) throws IOException {
                         out.writeInt(vec.size());
                         for (Map.Entry<Text, IntWritable> e : 
vec.entrySet()) {
                                 e.getKey().write(out);
                                 e.getValue().write(out);
                         }
                 }

                 @Override
                 public void readFields(DataInput in) throws IOException {
                         int n = in.readInt();
                         for (int i = 0; i < n; ++i) {
                                 Text t = new Text();
                                 t.readFields(in);
                                 IntWritable iw = new IntWritable();
                                 iw.readFields(in);
                                 vec.put(t, iw);
                         }
                 }
     ...
     }

Any help appreciated.

Many thanks,
Marcin Sieniek