You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Bejoy KS (JIRA)" <ji...@apache.org> on 2012/08/03 00:53:02 UTC
[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427695#comment-13427695 ] 

Bejoy KS commented on MAPREDUCE-4507:
-------------------------------------

This piece of code will trigger IdentityReducer. No compile time errors thrown even though the Input Key Type is not matching at Class and Method levels 

Main Class
{code}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class WcNewMain extends Configured implements Tool
{
      public int run(String[] args) throws Exception
      {
            //getting configuration object and setting job name
            Configuration conf = getConf();
	        Job job = new Job(conf, "Word Count ");
	      
	        //setting the class names
	        job.setJarByClass(WcNewMain.class);
	        job.setMapperClass(WcMapperNew.class);
	        //job.setReducerClass(WordCountReducer.class);
	        job.setNumReduceTasks(0);
	
	        //setting the output data type classes
	    	job.setOutputKeyClass(Text.class);
	        job.setOutputValueClass(IntWritable.class);
	
	      
	        
	        FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/userdata/bejoy/samples/wc/input"));
		    FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/userdata/bejoy/samples/wc/output"));
	
	        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new WcNewMain(), args);
        System.exit(res);
    }
}


{code}

Mapper Class
{code}
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WcMapperNew extends Mapper<IntWritable, Text, Text, IntWritable>
{
            //hadoop supported data types
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();
       
           public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
           {
             //taking one line at a time and tokenizing the same
               String line = value.toString();
               StringTokenizer tokenizer = new StringTokenizer(line);
           
             //iterating through all the words available in that line and forming the key value pair
               while (tokenizer.hasMoreTokens())
               {
                  word.set(tokenizer.nextToken());
                  //sending to output collector which inturn passes the same to reducer
                  context.write(word, one);
               }
           }
           
 }
{code}
                
> IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4507
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.0.3
>         Environment: linux ubuntu
>            Reporter: Bejoy KS
>
> If we use the default InputFormat (TextInputFormat) but specify the Key type in mapper as IntWritable instead of Long Writable. The framework is supposed throw a class cast exception.Such an exception is thrown only if the key types at class level and method level are the same (IntWritable). But if we provide the Input key type as IntWritable on the class level but LongWritable on the method level (map method), instead of throwing a compile time error, the code compliles fine . In addition to it on execution the framework triggers Identity Mapper instead of the custom mapper provided with the configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper as Custom Mapper itself , it should show IdentityMapper in cases where IdentityMapper is triggered to avoid confusion and easy debugging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira