You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Adam Shook <as...@clearedgeit.com> on 2011/08/01 23:19:07 UTC

Unusual large number of map tasks for a SequenceFile

Hi All,

I am writing a sequence file to HDFS from an application as a pre-process to a MapReduce job.  (It isn't being written from a MR job, just open, write, close)

The file is around 32 MBs in size.  When the MapReduce job starts up, it starts with 256 map tasks.  I am writing SequenceFiles from this first job and firing up a second with the first job's output.  The second job has around 32KB of input with 138 map tasks.  There are 128 part files, so it should only be 128 map tasks for this second job.  This seems to be an unusually large amount of map tasks since the cluster is configured to the default block size of 64MB.  I am using Hadoop v0.20.1.

Is there something special about how the SequenceFiles are being written?  As far as how I am using to write the first file, below is a code sample.

Thanks,
Adam


FileSystem fs = FileSystem.get(new Configuration());
Writer wrtr = SequenceFile.createWriter(fs, fs.getConf(), <path_to_file>, Text.class, Text.class);

for (String s1 : strings1) {
      for (String s2 : strings2) {
wrtr.append((new Text(s1), new Text(s2));
}
}

wrtr.close();