You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Ziawasch Abedjan <zi...@yahoo.de> on 2010/01/06 23:18:51 UTC

mapper rusn into deadlock when using custom InputReader

Hi,

we got an application that runs into a never ending mapper
routine when we start the application with more than 1 mappers. If we
start the application on a Cluster or Pseudo Cluster with only one
mapper and reducer it is doing fine. We use a custom FileInputFormat
with a custom RecordReader. Their code is attached.

This
is the mapper function. For clarity I removed most of the code because
there is no error within the map function. As you will see in the log
messages below for a run with two mappers that both mapper completely
run through the map code with no error. The problem is somewhere after
the map and before the reduce part of the run. And as said before it is
only faced if we use more than one mapper. When it has done 50%
mapping and 16% reducing it doesnt reply any more and runs infinitely.

public void
map(LongWritable key, Text value, OutputCollector<LongWritable, Text> output, Reporter reporter) throws IOException {
LOG.info("Masks: " +
BinaryStringConverter.parseLongToBinaryString(MASKS[0]) + ", " +
BinaryStringConverter.parseLongToBinaryString(MASKS[1]) + ", " +
BinaryStringConverter.parseLongToBinaryString(MASKS[2]) + ", " +
BinaryStringConverter.parseLongToBinaryString(MASKS[3]));
.........
.......
LOG.info("Finished with mapper commands.");
}

When starting the application with more than one mapper. Every mapper
reachs the last LOG.info output of the mapping function.

But the output logs of the mapper look like this:
Mapper that failed:

2010-01-05 15:34:16,640 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=

2010-01-05 15:34:16,796 INFO de.hpi.hadoop.duplicates.LongRecordReader: Splitting from 0 to 800 length: 800

2010-01-05 15:34:16,828 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 2

2010-01-05 15:34:16,859 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100

2010-01-05 15:34:25,609 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720

2010-01-05 15:34:25,609 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
.............
.......
2010-01-05 15:34:26,828 INFO de.hpi.hadoop.duplicates.DuplicateFinder: Finished with mapper commands.

Mapper that does not fail:

2010-01-05 15:34:16,656 INFO
org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2010-01-05 15:34:16,828 INFO de.hpi.hadoop.duplicates.LongRecordReader: Splitting from 800 to 1600 length: 800
2010-01-05 15:34:16,843 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 2
2010-01-05 15:34:16,859 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2010-01-05 15:34:25,609 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
2010-01-05 15:34:25,609 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
.............
.......
2010-01-05 15:34:26,765 INFO de.hpi.hadoop.duplicates.DuplicateFinder: Finished with mapper commands.
2010-01-05 15:34:26,765 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2010-01-05 15:34:27,531 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0
2010-01-05 15:34:27,578 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201001051529_0002_m_000001_0 is done. And is in the process of commiting
2010-01-05 15:34:27,656 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201001051529_0002_m_000001_0' done.

Please share if you have faced similar problem or if you know the solution or you need more information.

Thanks,
Ziawasch Abedjan

__________________________________________________
Do You Yahoo!?
Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails.
http://mail.yahoo.com