You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Basnight <mb...@gmail.com> on 2010/01/08 17:17:23 UTC

MapFile in mapper showing weird values

I have created a very small mapfile for testing purposes. When i use this file as the input file for a hadoop job (with a SequenceFileInputFormat), i get a very strange LongWritable value written in, which i did not write in the mapfile. I even had to go so far to change my Mapper from <Text, Text..> to <Text, Writable, ...> to accommodate this. Im using hadoop 0.20.0 and HadoopTestCase (Local_MR, Local_FS, 1, 1). I have not tested this in a production env fwiw. Anywone have ideas whats going on? All replies appreciated!

Mapfile creation,

MapFile.Writer w = new MapFile.Writer(tool.getConfiguration().getJobConf(), getFileSystem(), inputDir + "/part-00000", Text.class, Text.class);
Text t;
t = new Text("apple");
w.append(t, new Text("orange"));
t = new Text("bar");
w.append(t, new Text("foo"));
t = new Text("foo");
w.close();

Beginning of mapper class,

public class MapSortJoinMapper implements Mapper<Text, Writable, Text, Text> {

    public void map(Text key, Writable value, OutputCollector<Text, Text> textTextOutputCollector, Reporter reporter) throws IOException {
        try {
            System.out.println(key + ":" + key.getClass());
            System.out.println(value + ":" + value.getClass());
...

Output of mapper class,

apple:class org.apache.hadoop.io.Text
orange:class org.apache.hadoop.io.Text
bar:class org.apache.hadoop.io.Text
foo:class org.apache.hadoop.io.Text
10/01/08 10:14:58 INFO mapred.MapTask: Starting flush of map output
10/01/08 10:14:59 INFO mapred.MapTask: Finished spill 0
10/01/08 10:14:59 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
10/01/08 10:14:59 INFO mapred.LocalJobRunner: file:/tmp/input/test_mapfile/part-00000/data:0+174
10/01/08 10:14:59 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
10/01/08 10:14:59 INFO compress.CodecPool: Got brand-new decompressor
10/01/08 10:14:59 INFO compress.CodecPool: Got brand-new decompressor
10/01/08 10:14:59 INFO compress.CodecPool: Got brand-new decompressor
10/01/08 10:14:59 INFO mapred.MapTask: numReduceTasks: 1
10/01/08 10:14:59 INFO mapred.MapTask: io.sort.mb = 100
10/01/08 10:14:59 INFO mapred.MapTask: data buffer = 79691776/99614720
10/01/08 10:14:59 INFO mapred.MapTask: record buffer = 262144/327680
apple:class org.apache.hadoop.io.Text
121:class org.apache.hadoop.io.LongWritable